Data Infrastructure Team Leader
We are a high-impact team of Infrastructure Data Engineers responsible for building and maintaining the backbone of our data ecosystem. Our team thrives on collaboration, technical excellence, and a deep sense of ownership. We design, operate, and scale core components such as Kafka, Kafka Connect, Elasticsearch, and Logstash, supporting real-time data flows that power critical business decisions. As a tight-knit group, we value clear communication, reliability, and continuous learning. Everyone contributes meaningfully. Joining us means working with passionate engineers who take pride in building robust, scalable systems—and supporting each other every step of the way. If you enjoy solving complex infrastructure challenges, automating data pipelines, and being part of a collaborative team that forms the foundation of the company’s data strategy, we’d love to meet you. We are looking for an experienced and motivated Data Infrastructure Team Lead to lead our Data Infrastructure team and drive the design, reliability, and scalability of the systems that power our data platforms. In this role, you will combine technical leadership with hands-on engineering, guiding the team in building, operating, and optimizing mission-critical data infrastructure while ensuring high availability, operational excellence, and alignment with business needs.
- Lead and mentor a team of Data Infrastructure Engineers, fostering technical excellence, knowledge sharing, and continuous improvement.
- Own the reliability, stability, and operational health of the company’s production data infrastructure.
- Operate and maintain core data infrastructure platforms such as Apache Kafka, Kafka Connect, Elasticsearch, Logstash, and Kibana.
- Lead the design, deployment, and monitoring of scalable data flows across on-premise and cloud environments.
- Drive automation initiatives, including infrastructure-as-code, CI/CD pipelines, and deployment automation, to improve platform reliability and operational efficiency.
- Oversee the development of internal tools and microservices that expose data platform capabilities as self-service services for internal teams.
- Collaborate closely with engineering, data, analytics, and platform teams to ensure reliable data integration and delivery.
- Define and maintain monitoring, alerting, and observability frameworks using tools such as Prometheus and Grafana.
- Lead the response to production incidents, including root cause analysis, post-mortems, and preventive improvements.
- Plan and lead platform upgrades, capacity planning, and security improvements across data infrastructure components.
- Manage operational priorities, technical roadmaps, and infrastructure improvements aligned with business and platform strategy.
- Strong proficiency in Python or similar languages used for building infrastructure tools, automation, and microservices.
- Deep hands-on experience with Apache Kafka, Kafka Connect, and Schema Registry (Confluent or Apache) in large-scale production environments.
- Strong working knowledge of the ELK stack (Elasticsearch, Logstash, Kibana) for observability, monitoring, and operational troubleshooting.
- Experience designing and maintaining internal platform services, microservices, and APIs that expose infrastructure capabilities to internal teams.
- Hands-on experience with containerization and orchestration technologies such as Docker and Kubernetes (or similar platforms such as Rancher).
- Strong experience implementing monitoring and observability solutions using tools such as Prometheus and Grafana.
- Advanced experience working with Linux-based environments, including shell scripting, troubleshooting, and system performance tuning.
- Solid experience with CI/CD pipelines, version control (Git), and infrastructure automation tools such as Ansible, Terraform, or similar technologies.
Knowledge:
- Strong understanding of distributed systems, data streaming architectures, and high-availability platform design.
- Deep understanding of DevOps principles, infrastructure-as-code practices, and automation-driven operations.
- Knowledge of security best practices, including access control, secrets management, and secure platform design.
- Familiarity with cloud infrastructure (AWS, GCP, or Azure) and hybrid cloud/on-premise architectures.
- Understanding of production operations, incident management processes, and reliability engineering practices.
Experience:
- 5+ years of experience working with data infrastructure, streaming platforms, or platform engineering.
- 3+ years of experience leading or mentoring engineering teams, including technical leadership, task prioritization, and team development.
- Proven track record operating and improving large-scale production data platforms and streaming pipelines.
- Experience leading incident management and operational response in production environments.
- Experience working in cross-functional environments, collaborating with engineering, data, analytics, and platform teams.
- Experience defining technical roadmaps, operational processes, and platform improvements.
Nice-to-Have Skills:
- Experience working with RDBMS systems such as Microsoft SQL Server (MSSQL).
- Familiarity with stream processing frameworks such as Apache Flink or similar technologies.
- Experience with multi-tenant platforms, role-based access control (RBAC), and platform governance.
- Experience building self-service infrastructure platforms for internal engineering teams.
- Hybrid work model
- Free parking in the building + free electric car charging
- Broad collective health insurance (with options for family members and extensions)
- Birthday gift + day off during your birthday month
- Refer a friend – bonus or gift card
- HitechZone membership
- Gifts on holidays and life events
- Ten Bis
