Systems Reliability Engineer
Quick Summary
Drive a culture of accountability, ownership, and continuous improvement You thrive on building meaningful relationships and helping others succeed You understand the unique challenges that defense tech startups face, and can speak their language 5+…
At Arkenstone Defense, we empower defense tech startups with the tools, infrastructure, and compliance solutions they need to become successful prime contractors. Our mission is to remove barriers and help innovators grow - from day one to becoming a trusted prime for the U.S. Government.
We're early, we're lean, and we're building something that actually matters. The people who do well here aren't waiting to be told what to do; they see a gap and fill it.
Responsibilities
~1 min read- →
Design, implement, and own the infrastructure reliability strategy across AWS, Azure, and GCP
- →
Champion observability by developing and maintaining effective logging, monitoring, and alerting systems
- →
Lead efforts in performance tuning, system hardening, capacity planning, and disaster recovery
- →
Own the incident management lifecycle: from detection to postmortem and root cause analysis
- →
Automate deployment, scaling, and recovery workflows to reduce manual toil
- →
Contribute to infrastructure as code (Terraform, ARM templates, CloudFormation, etc.)
- →
Act as a mentor and technical leader to junior engineers and cross-functional partners
Perform any other related duties as required or assigned
Drive a culture of accountability, ownership, and continuous improvement
You thrive on building meaningful relationships and helping others succeed
You understand the unique challenges that defense tech startups face, and can speak their language
Requirements
~1 min read5+ years of experience in SRE, DevOps, or infrastructure engineering roles
Proven track record of operating large-scale systems in multi-cloud environments
Strong knowledge of cloud-native architecture, container orchestration (e.g., Kubernetes), and CI/CD pipelines
Proficient in scripting (Python, Bash, etc.) and infrastructure automation tools
Experience with monitoring/observability platforms (e.g., Prometheus, Grafana, Datadog, ELK, etc.)
Excellent problem-solving skills and a bias toward ownership and action
Comfortable making decisions under pressure and leading through incidents
Working knowledge of FedRAMP or NIST 800-53 controls preferred
Comfortable participating in customer discussions
Clear communicator who can translate technical concepts to mixed audiences
What We Offer
~1 min readWe are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age, disability, genetic information, veteran status, or any other characteristic protected under applicable law.
Location & Eligibility
Listing Details
- Posted
- April 6, 2026
- First seen
- May 5, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 34%
- Scored at
- May 6, 2026
Signal breakdown
Please let arkenstonedefense know you found this job on Jobera.
1 other job at arkenstonedefense
View all →Explore open roles at arkenstonedefense.
Similar Systems Reliability Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.