Zilliz
Zilliz13mo ago
USD 175000–225000/yr

Senior Software Engineer, Cloud Reliability

United StatesUnited States·Redwood City,Redwood CityFull-Timesenior
EngineeringDevOps & InfrastructureSoftware EngineerSite Reliability EngineerSoftware EngineeringDevops EngineerInfrastructure & Cloud
4 views0 saves0 applied

Quick Summary

Overview

Zilliz is a fast-growing startup developing the industry’s leading vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector database, the company builds next-generation database technologies to help organizations quickly…

Technical Tools
ansibleawsazuredockergcpgitlab-cijavajenkinskubernetespythonterraformci-cddistributed-systems

Zilliz is a fast-growing startup developing the industry’s leading vector database for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector database, the company builds next-generation database technologies to help organizations quickly create AI applications. On a mission to democratize AI, Zilliz is committed to simplifying data management for AI applications and making vector databases accessible to every organization.


We're entering our next phase of 10x growth; more customers, larger datasets, and far higher expectations for reliability. You'll join a small, fast-moving Cloud Platform team that operates large-scale, multi-cloud, distributed database systems in production. This is a high-ownership role for engineers who want to move fast, build automation instead of toil, and take real responsibility for production stability.

  • Own the reliability, availability, and production stability of Zilliz Cloud as we scale through the next stage of growth
  • Debug complex production issues across Kubernetes, cloud infrastructure, networking, storage, and distributed database systems
  • Build automation and diagnostic tooling; log analysis, alert correlation, incident investigation, runbook automation, and remediation workflows so problems get solved once, not repeatedly
  • Turn recurring incidents into reusable tools, automation, documentation, and product improvements
  • Improve observability across latency, availability, throughput, and resource efficiency
  • Partner with database and infrastructure engineers to make Zilliz Cloud more reliable, scalable, and automated
  • 3+ years building or operating production cloud systems, infrastructure platforms, database systems, or large-scale online services
  • Bachelor's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
  • Strong hands-on experience with Kubernetes, Docker, and at least one major cloud platform (AWS, GCP, or Azure)
  • Solid understanding of distributed systems; availability, scalability, performance, failure recovery, and operational tradeoffs
  • Experience with distributed databases, storage systems, search systems, or large-scale online systems is a strong plus
  • Experience operating highly multi-tenant systems or large infrastructure fleets; thousands of nodes, clusters, tenants, or customer deployments is especially valuable
  • Familiarity with modern cloud operations tooling such as Terraform, Helm, Argo CD, Prometheus, Grafana, and CI/CD systems
  • Strong bias for action, and the drive to thrive in a fast-paced, rapidly scaling environment
  • High ownership: You own production reliability end-to-end. The whole system, not a slice of it. High autonomy, high trust, minimal process.
  • Fast and focused: We ship often and keep a high bar. This team suits engineers who want velocity and a steep growth curve over red tape.
  • Globally distributed: We work closely with our core engineering teams across APAC. Occasional early morning or evening syncs in exchange for an on-call setup designed around timezone coverage, not overnight pages.
  • Zilliz is an Equal Opportunity Employer and welcomes people from all backgrounds, experiences, abilities, and perspectives. All qualified applicants will receive consideration for employment regardless of race, color, national origin, religion, sexual orientation, gender, gender identity, age, physical disability, or length of time spent unemployed.

    Location & Eligibility

    Where is the job
    Redwood City, United States
    Hybrid — some on-site time required
    Who can apply
    Open to applicants worldwide
    Listed under
    Worldwide

    Listing Details

    Posted
    May 15, 2025
    First seen
    March 26, 2026
    Last seen
    June 26, 2026

    Posting Health

    Days active
    91
    Repost count
    0
    Trust Level
    44%
    Scored at
    June 26, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Zilliz
    Zilliz
    lever
    Employees
    30
    Founded
    2006
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    ZillizSenior Software Engineer, Cloud ReliabilityUSD 175000–225000