Senior Distributed Storage SRE Engineer
Responsible for the daily operation and maintenance of distributed storage systems (e.g. online release, software deployment, monitoring, inspection,etc.). Responsible for the stability of the block storage, the design and implementation of disaster recovery solutions, promote the improvement of service reliability, scalability and performance optimization, and guarantee system SLA. Responsible for resource management and planning of block storage and related systems to improve resource efficiency. Participate in the construction of the operation and maintenance support platform, develop tools, and improve operational efficiency. Quickly respond to online incidents, be able to discover, debug and solve common faults, hidden dangers and performance problems, and be responsible for the implementation of emergency plans and fault recovery strategies.
- Responsible for the daily operation and maintenance of distributed storage systems (e.g. online release, software deployment, monitoring, inspection,etc.).
- Responsible for the stability of the block storage, the design and implementation of disaster recovery solutions, promote the improvement of service reliability, scalability and performance optimization, and guarantee system SLA.
- Responsible for resource management and planning of block storage and related systems to improve resource efficiency.
- Participate in the construction of the operation and maintenance support platform, develop tools, and improve operational efficiency.
- Quickly respond to online incidents, be able to discover, debug and solve common faults, hidden dangers and performance problems, and be responsible for the implementation of emergency plans and fault recovery strategies.
- Bachelor’s degree Computer Science or related technical field, or equivalent practical experience.
- Experience with Unix/Linux operating systems internals (e.g. filesystems, storage devices).and with networking (e.g., tcp/ip, routing) or cloud systems.
- Experience with analyzing and troubleshooting storage systems.
- Experience programming in one or more of the following: Shell, Python, Go, etc.
- Experience in designing or managing large-scale distributed storage systems, understanding the principle of distributed system and be familiar with open source distributed storage system (e.g. NAS, HDFS, CEPH).
- Familiar with cloud products, have practical experience in block storage, and be able to deal with common block storage-related problems.
- Experience with SRE jobs (e.g. online release, monitoring, daily inspection etc.) and script programming.
- Strong sense of responsibility, and be able to respond and deal with problems in a timely manner.
- Sign on payment, relocation package, and restricted stock units, which will be evaluated on a case-by-case basis.
- Medical, dental, vision, life and disability benefits, and participation in the Company’s 401(k) plan.
- Up to 15 to 25 days of vacation per year (depending on the employee’s tenure).
- Up to 13 days of holidays throughout the calendar year.
- Up to 10 days of paid sick leave per year.



