Director, Site Reliability Engineering (SRE)

Pleasantonexecutive

EngineeringOtherDevOps & InfrastructureReliability EngineerSite Reliability Engineering

0 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

Build and lead SRE/DevOps organizations operating multi-tenant SaaS at scale on AWS/Azure/GCP, including production ownership for availability, latency, incident response, DR, and capacity management.

Requirements Summary

At least 15 years of experience building and leading SRE/DevOps organizations operating multi-tenant SaaS at scale on AWS, Azure, or GCP. Deep technical knowledge of cloud infrastructure architecture,

Technical Tools

EngineeringOtherDevOps & InfrastructureReliability EngineerSite Reliability Engineering

IonQ, Inc. [NYSE: IONQ] is the world’s leading quantum platform and merchant supplier - delivering integrated quantum solutions across computing, networking, sensing, and security. IonQ’s newest generation of quantum computers, the IonQ Tempo, is the latest in a line of cutting-edge systems that have been helping customers and partners including Amazon Web Services, and AstraZeneca achieve 20x performance results and accelerate innovation in drug discovery, materials science, financial modeling, logistics, cybersecurity, and defense. In 2025, the company achieved 99.99% two-qubit gate fidelity, setting a world record in quantum computing performance.

Headquartered in College Park, Maryland, IonQ has operations in California, Colorado, Massachusetts, Tennessee, Washington, Italy, South Korea, Sweden, Switzerland, Canada, and the United Kingdom. Our quantum computing services are available through all major cloud providers, while we also meet the needs of networking and sensing customers across land, sea, air, and space. IonQ is making quantum platforms more accessible and impactful than ever before.

We are looking for a Director of SRE. As a Director of SRE, you'll be part of a cross-functional team whose mission is to lead IonQ on its journey to build the world's best quantum computers to solve the world's most complex problems.

In this role, you will build and lead SRE/DevOps organizations operating multi-tenant SaaS at scale on AWS, Azure, and GCP. You will be responsible for production ownership of availability, latency, incident response, and capacity management while implementing an SRE operating model using SLOs/SLIs and error budgets. Your leadership will bridge the gap between cloud infrastructure architecture and AI-ready operations to ensure a secure-by-default platform for our product teams.

Responsibilities:

Build and lead SRE/DevOps organizations operating multi-tenant SaaS at scale on AWS/Azure/GCP, including production ownership for availability, latency, incident response, DR, and capacity management.
Architect cloud infrastructure focusing on networking (VPC/VNet, routing, private connectivity), compute, containers/orchestration, and data platforms.
Implement SRE operating models using SLOs/SLIs and error budgets to balance reliability and delivery velocity.
Drive CI/CD and release engineering leadership, ensuring safe progressive delivery (canary/blue-green), automated rollbacks, and measurable deployment health.
Scale Infrastructure-as-Code (IaC) and platform automation through "golden pipelines," standardized modules, and secure-by-default guardrails.
Lead cross-functional execution across Product, Engineering, Security, Support, and Customer Success while setting clear ownership boundaries.
Own organizational planning, including hiring, team topology, on-call models, budget, and vendor strategy.
Establish a culture of operational excellence through blameless postmortems, corrective-action tracking, and toil reduction.

Requirements

~1 min read

At least 15 years of experience building and leading SRE/DevOps organizations operating multi-tenant SaaS at scale on AWS, Azure, or GCP.
Deep technical knowledge of cloud infrastructure architecture, networking, containers, and secure-by-default platform guardrails.
Proven ability to run production for global enterprise/federal customer bases, including tenant isolation and data residency considerations.

Requirements

~1 min read

AI-ready operations experience for networking SaaS, including streaming telemetry pipelines and closed-loop automation.
Experience with Juniper Mist AI or similar large-scale networking SaaS platforms is strongly preferred.
Knowledge of AI-native networking concepts such as service-level expectations (SLEs) and proactive anomaly detection.
Security and resilience mindset aligned to Zero Trust designs and continuous telemetry policy enforcement.
Hands-on experience operating SaaS products for networking/security domains where customer impact is tied to network behavior.
Executive communication strength, with the ability to present SLO posture, incident learnings, and risk to leadership.