Sr. Cloud AI Infrastructure Engineer
Conduct in-depth research into the underlying hardware logic of various AI accelerators; evaluate the power-efficiency ratio and suitability of different heterogeneous architectures in the context of Large Language Model (LLM) inference and training. Design and optimize high-performance operator libraries for large-scale cloud computing environments; resolve long-tail latency issues in hardware scheduling, memory management, and distributed communication. Define the interconnect architecture; drive the virtualization, standardized access, and efficient pooling of heterogeneous computing resources in the cloud. Monitor global trends in semiconductors and accelerators; perform feasibility studies and experimental validation for the implementation of emerging technologies within cloud infrastructure.
- Architecture Research: Conduct in-depth research into the underlying hardware logic of various AI accelerators; evaluate the power-efficiency ratio and suitability of different heterogeneous architectures in the context of Large Language Model (LLM) inference and training.
- Operator & Performance Optimization: Design and optimize high-performance operator libraries for large-scale cloud computing environments; resolve long-tail latency issues in hardware scheduling, memory management, and distributed communication.
- Interconnect Architecture Definition: Define the interconnect architecture ; drive the virtualization, standardized access, and efficient pooling of heterogeneous computing resources in the cloud.
- Technology Trend Analysis: Monitor global trends in semiconductors and accelerators; perform feasibility studies and experimental validation for the implementation of emerging technologies within cloud infrastructure.
- Master’s or Ph.D. degree in Computer Engineering, Electronic Engineering, Microelectronics, or a related field.
- Expertise in GPGPU architectures or other mainstream AI accelerator architectures.
- Proficient in parallel computing frameworks; deep understanding of low-level operator development languages (e.g., CUDA, Triton).
- Solid understanding of large-scale distributed systems, cluster topologies (e.g., Fat-tree, Torus), and high-performance network protocols.
- Familiar with the architectural evolution of global leading computing enterprises; ability to objectively analyze the technical pros/cons and engineering challenges of different architectural paths.
- Experience in the application, optimization, or architectural design of ultra-large-scale accelerator clusters is preferred.
- Experience in the low-level adaptation and performance tuning of mainstream deep learning frameworks (e.g., PyTorch, TensorFlow) is preferred.
- Sign on payment (case-by-case basis)
- Relocation package (case-by-case basis)
- Restricted stock units (case-by-case basis)
- Medical, dental, vision, life and disability benefits
- Participation in the Company’s 401(k) plan
- 15 to 25 days of vacation per year (depending on tenure)
- 13 days of holidays throughout the calendar year
- 10 days of paid sick leave per year



