Senior Cloud Engineer
WeChat is seeking Cloud Engineers to help deploy and scale WeChat's diverse ecosystem of services to over a billion users. The role will work closely with software engineers, data scientists, security specialists, and project managers to help develop internal tools and security systems for keeping WeChat users worldwide safe. Ensure site reliability by managing the deploy, scaling, and maintanence of new and existing online services to a worldwide network and userbase.
- Participate in system architecture and reliability design for new and existing services, balancing high-availability, service capacity, performance, and cost.
- Deploy, manage and maintain new and existing services; manage Kubernetes/container clusters (upgrade, scaling, multi-cluster governance).
- Build and operate observability: metrics/logs/tracing, alert strategy (noise reduction, severity, escalation) and incident handling and resolution.
- Own the CI/CD and release engineering pipeline: pipeline design, canary/gray release, rollback, configuration and change management.
- Drive automation to eliminate repetitive work (scripts/tools, IaC), and improve operational efficiency and quality.
- Handle high-severity incidents focusing on fast detection, mitigation, recovery, and postmortem-driven improvements.
- Bachelor's degree or above in Computer Science, Information Systems, or related fields
- Prior work experience in Cloud Engineering, Site Reliability Engineering (SRE), or DevOps for a major, public-facing internet service
- Strong Linux fundamentals (kernel/memory/process/thread/IPC) and solid networking knowledge (HTTP/DNS/TLS/TCP/IP)
- Cluster management experience with containers and Kubernetes; familiarity with upgrade, troubleshooting and governance.
- Experience in large-scale distributed systems and microservices; ability to reason about trade-offs (CAP, consistency, latency).
- Experience with monitoring/alerting and observability tools (e.g., Prometheus, Zabbix) and practical alert hygiene.
- Experience with CI/CD, release automation, and change management; familiarity with IaC is a plus.
- Hands-on experience with at least one language: Go, Python, Bash; able to write production-quality automation code.
- Database operations experience (MySQL/PostgreSQL/Redis) is a plus.
- Fluency in both English and Mandarin Chinese to deal with international stakeholders and stakeholders who are based in HQ



