Do you like automation? Do you know Linux like the back of your hand? Do you want to work on creating the Web 3.0 infrastructure? Are you excited about joining a startup? We are looking for a team member whose mastery of Linux, Github, Docker and Kubernetes is second to none.
Site Reliability Manager - DevOps
Web3 Foundation accelerates the development and adoption of the decentralized web. We’re providing the framework and setting the standards for an ecosystem so that the most cutting-edge projects can work together, multiplying their benefit to society as a whole.
We’re building the future of identity, privacy, financial markets and commerce through blockchain and other cryptographic technologies. At the core of this work is Polkadot - a platform that enables blockchains of all kinds to interact and communicate with one another. This is an opportunity to work at the forefront of technological development and join in shaping the future of society for the better.
- Participate in on-call rotation, failure resolution, post-mortem analysis and prevention through automation.
- Assist teams on making the platform components production-ready and provide support on IT-related issues.
- Take ownership of the different infrastructure-as-code components that build up our platform, adapting them to the evolution of the given requirements.
- Define automated tools to help the products be timely adapted to end-user requirements, including CI/CD pipelines.
- Continuously improve the observability of the system and the feedback loops for getting information about potential problems before they happen.
- Design automated disaster-recovery mechanisms.
- Experience designing and maintaining scalable, resilient, performant and observable systems.
- On-call experience: participation in ops-duty rotations, incident response and post-mortem analysis and prevention.
- A systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Solid background in software development.
- Experience or willingness to learn about cloud-native technologies: kubernetes as platform, Helm as package manager, prometheus and related technologies as the monitoring stack, all working on different providers such as AWS, Google Cloud Platform, DigitalOcean and Azure.
- Continuous Delivery experience.
- Interest and background in decentralized technologies, especially blockchain.
- Prometheus, alertmanager and grafana: create service monitors, write alert rules, create dashboards and panels.
- Experience using Terraform with a test-driven infrastructure approach.
To apply for this position, please submit your CV and a cover letter, telling us a bit about yourself and your motivation to join us by clicking Apply or by email to email@example.com.
For more information about us, visit us on