Senior DevOps AI Engineer
Quick Summary
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more.
DevOps focuses on providing an end-to-end service to turn software into live services. We work closely with Engineering, QE, and Customer Support teams to continuously improve engineering productivity and service reliability. We are also building Sherlock, an AI-powered SRE agent that automates incident investigation, root cause analysis, and runbook execution — and we need engineers who can both keep the infrastructure running and push the frontier of what AI-driven operations can do.
About the Role
~1 min readBased in Sydney, Australia, this is a hybrid role for someone who thrives in both worlds: a hands-on infrastructure engineer who can own GCP/AWS cloud operations at scale, and a backend engineer capable of building the AI agent layer that makes Sherlock intelligent and self-improving. You will own core DevOps responsibilities while also contributing to — and eventually leading — Sherlock’s knowledge capture pipeline, investigation state machine, accuracy benchmarking, and Phase 4 capability expansions.
Responsibilities
~1 min read- →Build and maintain infrastructure across GCP and AWS, including Compute Engine, GCS, GKE, Cloud SQL, Cloud DNS, VPC, PubSub, ElasticSearch, ScyllaDB, Databricks, Kafka, Sentry, Dagster, Airflow, Vault, Consul, Kong, and more.
- →Own infrastructure automation with Terraform/Terragrunt, Ansible, and Helm charts.
- →Drive microservice delivery via Helm charts, GitLab CI/CD pipelines, and ArgoCD.
- →Partner with Engineering on capacity planning, performance tuning, and production maintenance.
- →Partner with InfoSec to address production security issues.
- →Take on-call shifts and contribute to incident response.
- →Address tough scalability, stability, and observability problems.
- →Knowledge Capture agent: post-approval LLM summarisation, embedding generation, and structured writes to Jira, Notion, and pgvector.
- →Investigation state machine application layer: status transitions, retry logic, and dead-letter handling.
- →Accuracy metric (semantic diff) and speed metric — the signals that drive all prompt improvement decisions.
- →Regression test framework: replay 50+ historical investigations and gate prompt changes.
- →Phase 4 implementations: Customer Impact agent, Runbook Executor agent, and Zoom transcription ingestion into the Fact-Finding context.
- 5+ years of DevOps/SRE experience in GCP and/or AWS.
- Expert in Terraform/Terragrunt, Ansible, Kubernetes, Helm charts, and GitLab CI/CD.
- Proven ability to design deployment architecture and maintain high-scale, multi-layer web services on public cloud.
- Strong experience with k8s service mesh/ingress, autoscaling, and version upgrades.
- 4+ years of backend engineering in Python.
- LLM API experience: tool use, structured output, multi-turn conversations (Anthropic, OpenAI, Bedrock, or Vertex).
- Solid async Python: asyncio, task queues, worker patterns.
- Test-driven development — you write tests before or alongside code, not after.
- Comfort reading and writing SQL; PostgreSQL preferred.
- Computer science or related engineering degree.
- Full working rights in Australia.
- Multi-agent system design: coordinator-dispatcher patterns, registry-driven agent selection, tool-use orchestration across specialist agents.
- pgvector or other vector search experience.
- Slack API / Bolt framework for Python.
- Jira and Notion API integrations.
- Familiarity with Kafka, Elasticsearch, ScyllaDB, Databricks, Dagster, Sentry, and Kong.
- Prior work on internal DevOps or SRE tooling.
- Ability to diagnose system performance or functional issues from metrics and logs
Location & Eligibility
Listing Details
- Posted
- June 1, 2026
- First seen
- June 1, 2026
- Last seen
- June 1, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 60%
- Scored at
- June 1, 2026
Signal breakdown
Please let Obsidiansecurity know you found this job on Jobera.
3 other jobs at Obsidiansecurity
View all →Explore open roles at Obsidiansecurity.
Similar Machine Learning Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.