Rapidsos
Rapidsos5h ago
New

Site Reliability Engineering Manager

Boston Or New Yorkmid
OtherSite Reliability Engineering Manager
0 views0 saves0 applied

Quick Summary

Key Responsibilities

Own the reliability, scalability, and operational health of RapidSOS Kubernetes clusters, shared services, and core AWS infrastructure; ensure upgrades, capacity planning, node scaling,

Technical Tools
OtherSite Reliability Engineering Manager

At RapidSOS, we are committed to using technology to build a safer, stronger future and working together to save lives. We’re in an exciting phase of growth, welcoming new members from across the globe to our mission-driven, ambitious, and inclusive team. Our work is founded on our values of elevating purpose, inventing tomorrow, delivering with urgency, serving with integrity, and winning together, all of which support a company culture where people can innovate, collaborate, grow, and, above all, make an impact. 

RapidSOS is ​​the leading public safety AI company that unlocks mission-critical intelligence for first responders and security teams – enabling faster, smarter and more accurate emergency response. Real-time data from the world’s largest safety network of 700M+ devices, 200+ global enterprises, and 23,000+ federal, state and local agencies fuels the RapidSOS HARMONY AI engine that delivers this intelligence to those who need it most. Learn more at www.RapidSOS.com.

Responsibilities

~2 min read
  • Own the reliability, scalability, and operational health of RapidSOS Kubernetes clusters, shared services, and core AWS infrastructure;  ensure upgrades, capacity planning, node scaling, and testing that multi-region failover actually works
  • Drive the IaC foundation in Terraform/Atlantis and champion infrastructure-as-code as a core engineering standard
  • Partner with Engineering Managers to set SLOs for their services, establish error budgets, and help teams build the habits to operate what they ship; the goal is for product teams to own their services, not to have SRE own everything on their behalf
  • Maintain proactive reliability work: capacity planning, failure mode analysis, runbook quality, and chaos engineering exercises; run reliability reviews before major launches and organize failure mode exercises with product teams
  • Drive blameless postmortem practice, ensures every significant incident produces systemic improvements with clear ownership and closure
  • Run the Tier 1 on-call rotation: scheduling for primary and secondary engineers, coordination with the 3rd-party NOC, and keeping incident escalation processes smooth and manageable
  • Lead incident command on Sev-1s, escalate when needed, and keep engineering leadership informed throughout
  • Lead and grow a high-impact team by mentoring engineers, owning headcount, and thinking ahead about what the team needs as the function grows
  • Shape the team’s long-term AI strategy for infrastructure and operations by identifying opportunities for AI-driven automation and insight generation, evaluating tooling and workflows, and operationalizing best practices for scalable team-wide usage
  • Own reserved instance strategy and the team's AWS cost footprint, error budgets and SLOs across production services and communicate that picture clearly to engineering and product leadership
  • Work alongside Platform SRE on bigger infrastructure projects: Gateway API adoption, cross-region architecture, security changes
  • 7+ years in SRE, platform engineering, or DevOps, with at least two years where you were responsible for a team and not just your own work
  • You’ve been directly responsible for Kubernetes and AWS infrastructure in production environments where uptime and resilience are critical
  • Experience moving a team from reactive ops toward engineering-first reliability practices 
  • You’ve worked collaboratively with engineering teams to proactively improve reliability, scalability, and operational readiness before issues reach production
  • Ability to write Python,review production-quality scripts, and tooling
  • You’ve applied SLOs, error budgets, and blameless postmortems in practice to improve reliability and drive better engineering decisionsHands-on familiarity with: Terraform/Atlantis, Kubernetes/Helm/ArgoCD, Datadog, Concourse CI/GitHub Actions, RabbitMQ, and AWS (EKS, RDS/Aurora, ElastiCache, VPC networking, IAM, KMS, Route53)

What We Offer

~1 min read
The chance to work with a passionate team on solving one of the largest challenges globally
Competitive salary and benefits and equity participation
A dynamic, flexible and fun start-up work environment with a highly talented team

Starting pay for a successful applicant will depend on a variety of job-related factors, which may include experience, relevant skills, training, education, location, business needs, or market demands. The salary range for this role is $185,000 - $215,000. This role will also be eligible to receive equity options. #LI-Remote 

RapidSOS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or Veteran status. 

Interested in the role but you don’t meet 100% of the requirements? We’d love to hear from you! We encourage you to apply; we’d be excited to see if your unique skill set and experience could be a match.

Location & Eligibility

Where is the job
Boston Or New York
On-site at the office
Who can apply
Same as job location

Listing Details

Posted
May 21, 2026
First seen
May 21, 2026
Last seen
May 21, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
67%
Scored at
May 21, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Rapidsos
Rapidsos
greenhouse

RapidSOS is an intelligent safety platform that securely links life-saving data from connected devices, apps, and sensors to 9-1-1 and first responders, empowering faster and more effective emergency response.

Employees
350
Founded
2012
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

RapidsosSite Reliability Engineering Manager