
Site Reliability Engineer | Lisbon
- Hybrid
- Lisbon, Lisboa, Portugal
- Tech
Job description
The SRE is a team in the Engineering organisation that applies the well-known SRE mentality to Unbabel's "You Build It, You Run It" approach. Our mission is to use a software engineering mindset to deliver and maintain a set of services that form the backbone where the services are built and run by the rest of the Engineering teams.
Responsibilities:
Develop the platform that supports our business services, improving the systems that implement provisioning/automation, deployment, monitoring, and others
Collaborate with the other engineering teams to focus on improving the usability of those systems and advising on best practices for building scalable and reliable applications
Work with different open-source/third-party and cloud-native technologies
We are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction
Job requirements
Must haves:
Excellent English verbal and written communication skills
Engineering Degree or equivalent
Knowledge of SRE best practices
SRE experience developing, running and debugging multiple distributed systems at scale, in production
Ability to collaborate cross-functionally and effectively with diverse, fast-paced teams
Experience managing Kubernetes clusters and serverless environments
Solid knowledge of Linux, container internals, and computer networking
Experience working with monitoring/observability systems (e.g. Prometheus, Thanos, Grafana)
Experience in shell scripting and strong software development skills with one of the following: Python, Go, Java, Rust, Ruby, C/C++, or any similar programming language
Experience with any major cloud provider, preferably AWS
User experience with technologies like Docker, Kubernetes, Nginx, or Apache
Nice to have:
Experience in supporting or mentoring other engineers
Experience in creating response plans against Infrastructure and disaster failovers
Experience with technologies like Terraform, ArgoCD, GitLab CI/CD, and Hashicorp Vault
- Lisbon, Lisboa, Portugal
or
All done!
Your application has been successfully submitted!