Emburse
Emburse4h ago
New

Site Reliability Engineer III (SRE III)

TorontoFull-Timemid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

Who We Are: At Emburse, you’ll not just imagine the future – you’ll build it. As a leader in travel and expense solutions,

Technical Tools
EngineeringDevops Engineer
Who We Are:

At Emburse, you’ll not just imagine the future – you’ll build it. As a leader in travel and expense solutions, we are creating a future where technology drives business value and inspires extraordinary results. Our AI-powered platform helps organizations modernize financial operations, increase visibility, and optimize spend across the enterprise.

The Site Reliability Engineer III (SRE III) plays a critical role in ensuring Emburse’s systems are highly available, scalable, and performant. This role blends deep technical expertise with strong collaboration and leadership skills to drive operational excellence across distributed systems. The ideal candidate is passionate about automation, cloud infrastructure, observability, and continuous improvement, while mentoring junior engineers and driving reliability culture across the organization

  • Proactively identify, evaluate, and implement preventative measures to reduce customer impact.
  • Ensure all services are designed and operated with 24/7 availability, scalability, and resilience in mind.
  • Monitor, troubleshoot, and provide visibility to improve site latency, performance, and uptime.
  • Design, develop, and automate reliable cloud infrastructure and platform services.
  • Apply Infrastructure-as-Code (IaC) principles to manage large-scale distributed systems.
  • Write and maintain scripts, tools, and automation frameworks to support operational efficiency.
  • Partner with engineering leadership to develop solutions enabling developer productivity and remove cross functional dependencies.
  • Collaborate with Platform Engineering  teams on project definitions, requirements, backlog grooming, and planning processes.
  • Align operational goals with product and engineering roadmaps to ensure reliability requirements are met early in the lifecycle.
  • Define non-functional requirements (NFRs) and influence standards for scalability, observability, and fault tolerance.
  • Lead cross-functional troubleshooting of complex issues spanning applications, infrastructure, databases, and networks.
  • Serve as a technical mentor to SRE I and II engineers, guiding them in best practices for reliability, automation, and incident management.
  • Lead root cause analysis and postmortem reviews, driving continuous improvement initiatives.
  • Support offshore and distributed teams, promoting effective collaboration and communication.
  • Participate in design and architecture reviews, offering technical recommendations and documentation for key stakeholders
  • Required: Bachelor’s degree in Computer Science or a STEM field
  •  

  • Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
  • Preferred: Certified Kubernetes Administrator (CKA) and/or AWS Certification
  •  

Requirements

~1 min read
  • Strong proficiency in Linux-based distributed environments (up to 70% hands-on work).
  • Deep experience with cloud platforms (AWS or Azure) and Infrastructure-as-Code (Terraform).
  • Excellent scripting skills (Python, Bash, Powershell); object-oriented programming experience is a plus.
  • Demonstrated ability to develop and maintain internal tools and automation solutions.
  • Excellent written and verbal communication skills in English.
  • Strong project management and organizational abilities with a bias for action.
  • Experience collaborating with offshore or globally distributed teams.
  • Expertise in containerization and orchestration technologies (Docker, Kubernetes).
  • Experience with Kubernetes scaling tooling (Karpenter, KEDA).
  • Strong understanding of DevOps principles and modern CI/CD pipelines.
  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry).
  • Familiarity with self-healing systems, and site reliability best practices.
  • Background in SaaS environments or large-scale distributed applications.
  • Analytical thinker with a focus on root-cause problem solving.
  • Self-starter with a strong ownership mentality and accountability.
  • Mentor and collaborator who uplifts teams and promotes learning culture.
  • Committed to operational excellence and continuous improvement.

Location & Eligibility

Where is the job
Toronto
Hybrid — some on-site time required
Who can apply
Open to applicants worldwide

Listing Details

Posted
May 14, 2026
First seen
May 14, 2026
Last seen
May 14, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
70%
Scored at
May 14, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Emburse
Emburse
lever

Emburse humanizes work by empowering business travelers, finance professionals and CFOs to eliminate manual, time-consuming tasks so they can focus on what matters most.

Employees
750
Founded
2020
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

EmburseSite Reliability Engineer III (SRE III)