ARA Brand
ARA Brand7d ago
New

Senior Site Reliability Engineer

EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

Essential Functions: Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.

Technical Tools
argocdawsazureconfluencejirakubernetespythonlinuxsystem-design
  • Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
  • Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
  • Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
  • Drive continuous improvement in platform stability, maintenance, and availability.
  • Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders.
  • 8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
  • Strong experience with Linux systems administration and troubleshooting in enterprise environments.
  • Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
  • Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
  • Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
  • Experience with GitOps tools such as FluxCD or ArgoCD.
  • Proficiency scripting with at least one of Python, Go, or Bash.
  • Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
  • Strong understanding of reliability engineering concepts:
    • Service health indicators
    • High availability design, failure reduction, and testing
    • Operational readiness practices, including developing documentation, runbooks, and architectural descriptions
    • Incident response, root cause analysis, remediation/recovery
  • Ability to obtain a security clearance, which includes U.S. citizenship.

Nice to Have

~1 min read
  • Experience with multiple Linux distributions including Ubuntu.
  • Experience with at least one of the following: Tanzu Kubernetes, Nutanix Kubernetes Platform, Canonical Kubernetes.
  • Experience with cloud platforms such as AWS and Azure.
  • Experience with infrastructure automation and configuration management.
  • Experience managing AI tooling on Kubernetes including MCP Servers, LLM platforms (vLLM, Ollama), Kubeflow.
  • Experience with security and compliance considerations in regulated environments.
  • DoD experience.
  • Active or inactive Secret Security Clearance.
  • Bachelor’s degree in CS, Software Engineering or other IT-related field or equivalent experience

 

REMOTE WORK NOTICE:  This position may be performed fully remote, hybrid, or onsite at an ARA office. Preference will be given to candidates located onsite in the Albuquerque area.

Location & Eligibility

Where is the job
Albuquerque, United States
On-site at the office
Who can apply
US

Listing Details

Posted
April 30, 2026
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
26%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

ARA BrandSenior Site Reliability Engineer