As an Observability Engineer in our site reliability engineering team, you will be part of an empowered team that plays a key part in building, automating and enhancing our cloud and container-based infrastructure. You will work closely together with engineers from other teams and help them to successfully run their code on the platform we provide. You will be joining an experienced team of skilled engineers able to quickly onboard you to our systems.
The Infrastructure group plays a central role in scaling, automating and managing core parts of our cloud-based infrastructure. We act as "engineers for the engineers," helping others understand and leverage the architecture and platform underlying their features.
In the Reliability team, we use technologies such as AWS, Kubernetes, Istio, Datadog, Elasticsearch and Kibana. Our aim is to create a reliable platform for running our core services while at the same time enabling teams to take risks and experiment by having good separation between systems.
- Build, scale and manage our observability stack and cloud-based infrastructure including managing our Kubernetes clusters, logging pipelines and metrics system
- Actively engage and help our developers to improve the monitoring of their services
- Ensure high availability of production and pre-production systems
- Develop software that helps with automate daily routines and handles autoscaling and failure recovery
- Actively drive initiatives towards better system design and implementation of new technologies
- Work closely with the Developer Enablement, Data Infrastructure and Security teams
- Participate in infrastructure on-call rotations
- Champion our operations culture and help the engineering organisation deliver high availability services for our customers
- Experience in Linux operating systems and shell scripting
- Coding skills in at least one programming language. Infrastructure use Go and Python among others
- Experience with automating system and server management
- Understanding of distributed systems, networking and container technology
- Positive, proactive team player who is passionate about their craft and cares about helping the team deliver
- Written and verbal communication skills with the ability to clearly explain technical concepts to others in English
- Curious, with a growth mindset able to quickly learn supported by the senior engineers on the team
- Problem solver with operations skills that can quickly diagnose and pinpoint issues in a production environment
- 2+ years of experience with monitoring systems (monitoring, logging, tracing)
- 4+ years of experience with distributed, cloud systems
- Work on a product that helps create memorable travel experiences
- Smart, engaged co-workers
- Personal growth budget and mentorship programs
- Speak English in the office with people from over 70 nationalities
- Virtual stock options - be part of our success story
- Quarterly Hackathons and weekly tech talks
- GetYourGuide gift cards
- Relocation Assistance (varies by role/level)
We believe that diversity of experience, perspectives, and background are key to creating a great product and a great workplace.