System Reliability Engineer/DevOps
Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices; Lead incident response, perform root cause analysis, and implement recovery and long-term fixes; Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability; Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility; Identify bottlenecks, tune systems, and improve infrastructure performance; Monitor resources, forecast growth, and implement scaling strategies; Integrate security best practices into IaC, CI/CD pipelines, and deployments; Support vulnerability management; Participate in 24/7 rotations (once a week) for timely resolution of critical incidents; Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems; Maintain operational runbooks, incident reports, and system documentation.
3+ years in a DevOps, SRE, or related role; Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch; Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure; Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash); Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum); Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver; Experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager; Proficiency with OpenSearch, and Vector Agent for centralized logging; Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices; Familiarity with Cloudflare services, including caching, DNS, and Workers; Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools; Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.
Health & Wellness Focus; Global Medical Coverage; Growth Opportunities; Benefits Programs (compensation for the gym/stomatology/psychological service & etc.); Performance-Driven Rewards; Dynamic Work Environment.
Growe is a leading business advisory and services group in iGaming and Entertainment. We are creators of strategies that work and solutions that scale. Combining strategic vision with hands-on expertise, we help businesses navigate the fast-evolving industry, seize new opportunities, enter new markets, and achieve sustainable growth. Our expertise spans across key areas: from business and brand strategy development to market research, marketing solutions, IT customization, organizational structuring, and talent management. We partner with our clients to turn challenges into competitive advantages, ensuring successful market entries and long-term global expansion. We Are Opportunities Unlockers. Our approach is rooted in identifying potential and unlocking opportunities — whether it’s launching new iGaming brands worldwide or giving our team players a once-in-a-lifetime chance to thrive. Grow. Win. Repeat.
