Everbridge
New
USD 118700–145000/yr

Site Reliability Specialist (Observability & Kubernetes)

United StatesUnited StatesRemoteFull Timemid
OtherSite Reliability
0 views0 saves0 applied

Quick Summary

Overview

At Everbridge, we build resilient, scalable, and secure cloud platforms that power critical services used by 6,000+ organizations worldwide, especially when it matters most.

Technical Tools
OtherSite Reliability

At Everbridge, we build resilient, scalable, and secure cloud platforms that power critical services used by 6,000+ organizations worldwide, especially when it matters most.

We’re looking for a Platform Site Reliability Specialist to take ownership of our enterprise observability platform and help shape how our teams understand, monitor, and improve system reliability at scale.

This is a high-impact role where you’ll drive both technical excellence and strategic direction, ensuring our engineers have deep, real-time visibility into system health, performance, and reliability across a complex, cloud-native environment.

 

  • Head the design, operation, and evolution of Everbridge’s observability stack
  • Build and maintain a highly available, scalable observability platform
  • Standardize instrumentation, dashboards, alerts, and SLOs
  • Support incident response, root cause analysis, and capacity planning
  • Operate and scale Grafana and technology
  • Grafana Loki (logs)
  • Grafana Mimir (metrics)
  • Grafana Tempo (tracing)
  • Grafana Alerting
  • Maintain reliability and security of EKS clusters running observability
  • Manage cluster lifecycle and upgrades
  • Terraform for infrastructure provisioning
  • HashiCorp Packer
  • Gitlab CI/CD at Scale

  • 6+ years of experience in Site Reliability Engineering or Platform Engineering
  • Strong hands-on experience with the Grafana ecosystem
  • Deep expertise in Kubernetes, especially Amazon EKS
  • Solid proficiency with Terraform and infrastructure as code
  • Experience with OpenTelemetry
  • Background in large-scale observability systems
  • Experience with cloud cost optimization

Location & Eligibility

Where is the job
United States
Remote within one country
Who can apply
Open to applicants worldwide

Listing Details

Posted
May 2, 2026
First seen
May 2, 2026
Last seen
May 5, 2026

Posting Health

Days active
3
Repost count
0
Trust Level
86%
Scored at
May 5, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Everbridge

Everbridge, Inc. is a global software company that provides enterprise software applications to automate and accelerate an organization's operational response to critical events, helping to keep people safe and organizations running. Its Critical Event Management (CEM) platform is used by thousands of organizations worldwide to manage the full lifecycle of a critical event.

Employees
3k+
Founded
2002
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

EverbridgeSite Reliability Specialist (Observability & Kubernetes)USD 118700–145000