Posted 2026-06-09

Site Reliability Engineer

Description

This role is for a Site Reliability Engineer who is passionate about the latest technologies and can handle mission-critical responsibilities. The aim is to create the best performance, functionality, and user experience in the iGaming industry. You will be part of a passionate team, building an industry-leading, scalable, multi-brand platform to support numerous online brands used by hundreds of thousands of users.

Responsibilities

Investigate system incidents, drive Root Cause Analysis (RCAs), and execute long-term remedial fixes.
Proactively reduce the number of incidents caused by system changes.
Define and enforce Service Level Agreements (SLAs), Service Level Objectives (SLOs), and success metrics for new initiatives.
Build and maintain comprehensive dashboards to achieve observability excellence.
Identify and help resolve performance bottlenecks.
Optimize infrastructure and code to maintain fast service.
Conduct capacity planning to forecast future hardware or cloud resource requirements.
Guarantee the Platform components remain highly reachable and functional for users.
Oversee deployments to ensure new code does not disrupt the existing system.

Requirements

Deep experience building dashboards and tracking SLAs/SLOs using tools like Prometheus, Grafana, Coralogix, Splunk, or Loki. (required)
Proficiency in scripting and coding to automate manual tasks (eliminate "toil") and build reliability tools. (required)
Strong skills in .NET, Python, Powershell or Bash are highly preferred. (preferred)
Experience provisioning and managing infrastructure using Terraform or Ansible. (required)
Solid understanding of cloud platforms (AWS, GCP, or Azure). (required)
Hands-on experience scaling and managing distributed systems using Kubernetes (K8s) and Docker. (required)
Familiarity with deployment pipelines (GitLab CI, GitHub Actions, Team City, Octopus) to ensure safe, automated rollouts that don't cause incidents. (required)
Strong analytical skills for Root Cause Analysis (RCA). (required)
A calm approach to incident response. (required)
Ability to lead blameless post-mortems. (required)
AWS Cloud infrastructure, CDNs, and other various systems running in multiple data centres and environments. (required)
Cloud Application Load Balancer, preferably with experience on AWS ALB. (preferred)
Cloud DNS support such as AWS Route 53, GCP Cloud DNS, or Azure DNS. (required)
Experience with Microsoft SQL databases, PostgreSQL, and Couchbase is considered an asset. (nice-to-have)

Benefits

Fitness-wellness allowance 🧗‍♂️
Company mobile phone for private use with 100 GB 📱
Annual HUF devaluation compensation 💡
Hybrid model: 3 days in the office 🏢 & 2 days from home 🏠
Private Health Insurance 🩺
Career development 📈
Technical and soft-skill training opportunities 🎓
Breakfast, fruits & lunch 🍎
Team building events 🥳

About Betsson Group

From a single slot machine in 1963 to a Nasdaq Stockholm-listed organisation with licences across multiple jurisdictions, Betsson has evolved into a diversified, multinational business. Today we employ around 3,000 people representing more than 75 nationalities across +20 locations. Betsson AB is headquartered in Stockholm, while our operational headquarters in Ta’ Xbiex, Malta, drive the day-to-day business under what we refer to as Betsson Group. Our vision is to deliver the best customer experience in the industry. Through a portfolio of leading brands such as Betsson, Betsafe and NordicBet, we offer casino, sportsbook and other gaming products in regulated markets across Europe, South America, North America and Central Asia. Our proprietary technology underpins a scalable model that serves both B2C customers and B2B partners. Sustainability is embedded in our strategy. Responsible growth, customer protection and a commitment to our people and the communities we operate in remain central to how we create long-term value.

Site Reliability Engineer

Want to see more roles like this?

Sign in

Job Alerts