Posted 2026-06-09

Site Reliability Engineer

Description

Flutter Technology is looking for a Site Reliability Engineer to guarantee the stability, uptime, and efficiency of our essential gaming and betting platforms throughout our worldwide operations. This position blends engineering skills with operational proficiency to sustain continuous service availability for millions of users globally via on-call support. As a member of Flutter Functions, you will work closely with development groups, infrastructure experts, and business partners to maintain high-performance, scalable systems supporting our iGaming and Sports platforms in several markets.

You will be the expert responsible for building and managing enterprise-level observability, disaster recovery, and business continuity features across our AWS Cloud environment. The role requires a passion for system reliability and a proactive approach to spotting and fixing potential issues before they affect customers. You will be responsible for ensuring our systems are resilient, recoverable, and subjected to regular fire drills and extensive testing. This role follows a hybrid approach to working, allowing you to combine working from home with working in our modern offices.

Responsibilities

Maintain 99.9%+ uptime for the Observability platform that monitors and provides insights for systems serving millions of concurrent users.
Design and support complete monitoring, alerting, and observability systems.
Take responsibility for the tooling infrastructure that connects with various cloud services and platforms such as Grafana, Splunk, and CloudWatch.
Conduct capacity planning and performance optimisation to ensure systems can handle peak loads during major sporting events.
Establish and uphold Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for all essential services with assistance from Service Management.
Collaborate with Service Management to foster continuous improvement via blameless post-mortems.
Detect repeated failure trends across the platform and work with product teams on resilience upgrades.
Work together with Service Management on post-incident reviews, offering technical insights and assisting in the adoption of preventative measures.
Support the development and upkeep of detailed runbooks and incident response methods.
Deploy and maintain comprehensive monitoring dashboards and visualisation tools for real-time system visibility.
Create custom dashboards and visual analytics for business metrics, technical indicators, and operational insights.
Configure and optimise data ingestion from diverse sources including time-series databases, log aggregation systems, cloud monitoring services, and custom APIs.
Implement and refine alerting rules and notification workflows.
Develop and sustain APM capabilities, incorporating instrumentation and telemetry collection into the current observability ecosystem.
Work together with development teams to define, implement, and instrument custom business and technical metrics.
Own and maintain the chaos testing framework and tools, defining standard failure scenarios.
Support product teams in performing tests safely and consistently, and carry out disaster recovery fire drills.
Apply chaos engineering principles to proactively identify system weaknesses and vulnerabilities.
Work alongside development teams to boost application reliability and deployment procedures.
Mentor junior team members and contribute to the development of SRE practices across Flutter.
Participate in architecture reviews and provide reliability expertise for new system designs.
Document procedures, troubleshooting guides, and system architecture for knowledge sharing.

Requirements

Extensive experience with monitoring and observability tools including Prometheus, Grafana, ELK stack, or similar enterprise-scale solutions (required)
Established capability in handling cloud platforms like AWS, Azure, or Google Cloud Platform (required)
Extensive experience applying and sustaining reliability engineering methods in production settings supporting 24/7/365 operations (required)
Delivering and operating systems in stringent security-compliant and highly regulated environments (required)
Strong scripting and programming abilities in Python, Go, Bash, TypeScript, or Terraform (required)
Proven experience with CI/CD pipelines and tools including Jenkins, GitLab CI, Azure DevOps, GitHub Actions, or similar (required)
Working knowledge of database technologies including SQL databases (PostgreSQL, MySQL) and NoSQL solutions (required)
Producing comprehensive, clear, and actionable technical documentation for operational procedures and runbooks (required)
Operating within an agile setting alongside cross-functional groups (required)
Proficiency with containerisation technologies including Docker and Kubernetes (required)
Previous software engineering experience (nice-to-have)
AWS certifications (nice-to-have)
Experience in highly regulated industries such as gaming, financial services, or healthcare (nice-to-have)

Benefits

Discretionary annual bonus
30 days paid leave
Health and Dental Insurance for you, your partner, and your children
Personal life insurance and disability coverage
Wellbeing fund
Continuous learning support for certifications and career growth
550 EUR gift for a newborn family member
26 weeks Maternity leave at 100% pay
4 weeks secondary (Paternity) leave at 100% pay
Sports card membership
Monthly food vouchers
Pension benefits

About Flutter International

Flutter is the world’s leading online sports betting and gaming company, operating some of the most innovative, diverse and distinctive brands in the sector such as FanDuel, Sky Betting and Sky Gaming, Paddy Power, PokerStars, Betfair, Sportsbet, Tombola, Adjarabet, Sisal, Snai, Betnacional, Junglee Games and MaxBet. We have an unparalleled portfolio of world-class brands, global scale and challenger mindset, through which we excite and entertain our customers, in a safe and sustainable way. Using our collective power, the Flutter Edge, we aim to disrupt our sector, learning from the past to create a better future for our customers, colleagues and communities.

Read more about Flutter International →

Site Reliability Engineer

Want to see more roles like this?

Sign in

Job Alerts