Quick Summary
you can navigate ambiguous, high-pressure production issues, drive coordinated response, and follow through with durable improvements.
As a Senior Site Reliability Engineer, you will play a pivotal role in managing and owning critical incidents through to resolution to minimize business impact on our key critical business applications and our customer’s business operation and serving as a reliability advisor focused on improving the resilience, observability, and operability of critical platforms and services.
- Lead incident response for business-critical services coordinating cross-functional teams and suggest troubleshooting to restore service quickly and minimize business impact.
- Proactively notifies internal stakeholders of potential issues impacting service performance; provides regular status updates as required.
- Drive blameless postmortems and root cause analysis, turning incidents and recurring issues into systemic fixes, corrective actions, and long-term reliability improvements.
- Proactively identify and reduce sources of instability in systems by analyzing how our systems fail in production and driving architectural or operational improvements.
- Serve as a senior technical resource and reliability advisor to internal teams, sharing best practices and guiding teams toward sustainable operational excellence.
- Partner with engineering/product to shift-left reliability: design/readiness reviews, resilience reviews, and operational acceptance for launches and changes.
- Champion culture of reliability across business domains, act as a force multiplier: create clear documentation that enables other teams to adopt reliability improvements at scale.
Requirements
~1 min read- Experience with incident management and response: you can navigate ambiguous, high-pressure production issues, drive coordinated response, and follow through with durable improvements.
- Track record of proactively identifying reliability risks and gaps through metrics, incidents, architecture reviews, or resilience testing.
- Exceptional problem solving, critical systems thinking skills, and familiarity with chaos engineering concepts.
Strong collaboration and influence skills: you communicate clearly, build trust with partner teams, and can guide engineering teams toward better reliability practices. - Growth mindset and curiosity: you are eager to learn, comfortable challenging assumptions (including your own), and motivated by continuous improvement of systems, processes, and yourself.
- Minimum of 4-6 years of relevant experience or equivalent combination of education and experience in Senior Incident Management (with tech focus), SRE, Production Engineering, Software Engineering, DevOps Engineering, or similar role operating business-critical, high-traffic services in production.
- Good business English skills (Written and spoken).
- Diploma or equivalent work experience required.
- Fluency in modern infrastructure: proven hands-on experience with public cloud technologies and understanding of containerized and orchestrated platforms such as Kubernetes.
- Practical experience or a strong demonstrated interest in operating LLM-based systems, RAG pipelines, or agentic workloads, and understanding the reliability challenges of non-deterministic systems.
- General knowledge of Diebold Nixdorf products and services is a plus.
Location & Eligibility
Listing Details
- Posted
- May 26, 2026
- First seen
- May 26, 2026
- Last seen
- May 26, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 51%
- Scored at
- May 26, 2026
Signal breakdown
Please let Diebold Nixdorf, Incorporated know you found this job on Jobera.
3 other jobs at Diebold Nixdorf, Incorporated
View all →Explore open roles at Diebold Nixdorf, Incorporated.
Similar Devops Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.