Data Platform Infrastructure Engineer
We are looking for a Senior Data Platform Infrastructure Engineer to join the Data Platform Infrastructure team and take ownership of critical infrastructure, CI/CD, Kubernetes/EKS, Flink, OpenSearch, and platform automation capabilities. This role is essential for maintaining and evolving the Smart Data Platform infrastructure, supporting real-time data processing, improving platform reliability, enabling cost optimisation, and ensuring that engineering teams can deploy and operate data workloads safely and efficiently. The successful candidate will operate as a senior technical owner, working across infrastructure, streaming systems, CI/CD, observability, and operational support.
- Own and improve CI/CD pipelines for Data Platform workloads, including Flink deployments and supporting platform services.
- Design, build, and maintain Terraform/Terragrunt-based AWS infrastructure.
- Support and enhance EKS-based infrastructure, including node pools, autoscaling, scheduling, disruption budgets, observability, and operational reliability.
- Own and evolve Flink deployment patterns, including AWS Managed Flink and EKS-based Flink via Flink Operator.
- Support real-time data processing platforms, including production-ready infrastructure, monitoring, configuration, and operational improvements.
- Contribute to OpenSearch infrastructure and platform capabilities, including deployment architecture, security, authentication, access control, shared-data integration, and Terraform implementation.
- Improve platform cost efficiency through autoscaling, Karpenter tuning, GitLab runner optimisation, and infrastructure cleanup.
- Strengthen monitoring, alerting, and observability across Data Platform infrastructure.
- Participate in the SDP/Data Platform on-call rota and support production stability.
- Lead workshops, provide hands-on guidance, and share knowledge across teams.
- Establish reusable infrastructure and deployment patterns for other teams.
- Support engineers designing and configuring streaming-based solutions.
- Actively use AI-assisted engineering tools such as Codex, Copilot, Claude, or equivalent to improve delivery speed, quality, and engineering practices.
- Document solutions clearly and contribute to sustainable knowledge sharing.
- Strong hands-on experience with AWS infrastructure, ideally in a data platform or high-scale engineering environment. (required)
- Advanced experience with Kubernetes and Amazon EKS. (required)
- Strong experience with Terraform/Terragrunt and infrastructure-as-code best practices. (required)
- Strong experience with CI/CD pipeline design and automation, ideally using GitLab CI. (required)
- Experience with Apache Flink, Flink Operator, AWS Managed Flink, or similar streaming/data-processing technologies. (required)
- Strong Python scripting or development experience. (required)
- Experience with autoscaling technologies such as Karpenter, node pools, pod scheduling, disruption budgets, and Kubernetes workload optimisation. (required)
- Experience with monitoring and alerting for production infrastructure. (required)
- Experience operating production systems and participating in on-call support. (required)
- Ability to troubleshoot complex infrastructure, deployment, and runtime issues. (required)
- Experience with OpenSearch infrastructure, configuration, security, access policies, and production operation. (preferred)
- Experience with streaming architectures and real-time data platforms. (preferred)
- Experience with Kafka, Kafka Connect, or event-driven data pipelines. (preferred)
- Experience with GitLab runners on Kubernetes/EKS. (preferred)
- Experience reducing AWS infrastructure cost through autoscaling, workload optimisation, and platform cleanup. (preferred)
- Experience mentoring engineers and leading technical workshops. (preferred)
- Experience applying AI-assisted development tools in day-to-day engineering workflows. (preferred)
