Our SRE team’s mission is to: monitor, protect, and fix our applications, services and systems, while providing our customers with the highest service availability and performance. As an SRE team member you will proactively ensure the stability, resilience and scalability of our services by automation, testing and engineering.
You will work in our DevOps & SRE team, and collaborating with our Technology teams, to deliver our SRE objectives:
- Collaborate with the other Engineering teams on the architecture, design and delivery of production systems
- Automated change management and delivery pipeline into production
- Ensure safety, predictability, repeatability and auditability of all build and deploy processes
- Designing and setting up monitoring and alerting of our production systems
- Designing and introducing incident detection, early warning systems, self-healing systems
- Tracking and managing SRE metrics and Service Level Agreements/Objectives
- On-call management and facilitation of incident response
- Completing blameless Root Cause Analysis and Incident Reviews
- Collaborate on the design and support the implementation of our Disaster Recovery Plans
- Capacity planning and forecast demand
- Manage and improve performance and scalability of our services
- Propose and collaborate on cost saving activities
The ideal candidate will have:
- Software development experience: Java or python.
- AWS expertise; strong familiarity with core services (S3, EC2, ELB, ASG) and CloudFormation.
- Strong familiarity with Docker, ECS and containers ecosystems
- Continuous delivery: source control, build pipelines, zero-downtime deployment (e.g. blue-green deployments), canarying releases, etc.
- Proving performance and scalability via load, stress, endurance, spike testing, etc.
- Proving resilience via failure injection (chaos monkey).
- Being able to articulate CI/CD and SRE principles and communicate across non-engineering teams.
- Good to have: data analysis and visualization experience.
- Full time Monday - Friday. Note: This role is either full-time remote or hybrid working (3 days in office, 2 from home), if desired .
- 25 days paid holiday
- Private medical and dental cover
- Gym membership
- Life Cover
Pole Star Space Applications is a leading provider of ship-centric tracking, monitoring, compliance, and risk management services. We are a small team in London but a global area of activity providing services and business applications to a broad industry which includes Shipping and Offshore, Governments and Maritime Administrations, Financial sector including Banks, Insurance, Commodity and Trade financing as well as other areas of industry that has exposures to maritime trade and Shipping.