DevOps / Site Reliability Engineer
Quick Summary
About Bespoke Labs Bespoke Labs is an AI research and data company building the datasets, benchmarks, and evaluation infrastructure that power frontier AI models. We're backed by leading investors, trusted by top AI labs, and have research accepted at venues like ICLR 2026.
We're looking for a mid-level DevOps / Site Reliability Engineer to own and scale our cloud infrastructure. You'll work closely with engineering and ML teams to keep our systems reliable, observable, and fast — directly supporting the infrastructure…
3–5 years in DevOps, SRE, or infrastructure engineering Strong AWS experience — EKS, EC2, RDS, S3, IAM Kubernetes — deployment, scaling, troubleshooting in production CI/CD pipelines — GitHub Actions, ArgoCD, or similar Infrastructure as Code —…
Bespoke Labs is an AI research and data company building the datasets, benchmarks, and evaluation infrastructure that power frontier AI models. We're backed by leading investors, trusted by top AI labs, and have research accepted at venues like ICLR 2026. Our team is small, moves fast, and has an outsized impact on how the next generation of AI is built.
We're looking for a mid-level DevOps / Site Reliability Engineer to own and scale our cloud infrastructure. You'll work closely with engineering and ML teams to keep our systems reliable, observable, and fast — directly supporting the infrastructure that powers AI data pipelines at scale.
Responsibilities
~1 min read- →
Own cloud infrastructure on AWS — EC2, EKS, RDS, S3, IAM, VPC
- →
Manage Kubernetes clusters and container orchestration end-to-end
- →
Build and maintain CI/CD pipelines using GitHub Actions or similar
- →
Implement monitoring, alerting, and observability stacks (Prometheus, Grafana, or DataDog)
- →
Improve reliability, performance, and security of production systems
- →
Automate infrastructure with Terraform or similar IaC tools
- →
Debug and resolve issues across complex, distributed systems
- →
Participate in design reviews and help raise the infrastructure bar
3–5 years in DevOps, SRE, or infrastructure engineering
Strong AWS experience — EKS, EC2, RDS, S3, IAM
Kubernetes — deployment, scaling, troubleshooting in production
CI/CD pipelines — GitHub Actions, ArgoCD, or similar
Infrastructure as Code — Terraform, Pulumi, or CDK
Python or Go scripting
Experience working in production environments with real users
Comfort with ambiguity and ability to operate autonomously
Nice to Have
~1 min readExperience supporting ML training workloads or GPU clusters
Familiarity with distributed computing or large-scale data pipelines
Prior work at an AI, ML, or data company
Open-source contributions or published technical writing
What We Offer
~1 min readLocation & Eligibility
Listing Details
- Posted
- May 5, 2026
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 59%
- Scored at
- May 6, 2026
Signal breakdown
Please let bespokelabs know you found this job on Jobera.
3 other jobs at bespokelabs
View all →Explore open roles at bespokelabs.
Similar Devops Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.