astera
astera3mo ago
New

Site Reliability Engineer

Emeryville Hqfull-timemid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

About Astera: Astera is a private foundation with a $2.5B endowment on a mission to steer science and technology toward an abundant future for all. Unlike traditional foundations, we operate like a high-velocity startup with unprecedented access to computational resources and complete freedom from…

Requirements Summary

Ownership: You are comfortable being the person accountable when the cluster is unhealthy or capacity is tight. Systems Intuition: You understand how schedulers, containers, networking, storage, and hardware interact.

Technical Tools
ansibledockergrafanakubernetesprometheuspythonlinuxnetworking

Astera is a private foundation with a $2.5B endowment on a mission to steer science and technology toward an abundant future for all. Unlike traditional foundations, we operate like a high-velocity startup with unprecedented access to computational resources and complete freedom from funding pressures or profit motives. This allows us to focus on ambitious goals and attract incredibly creative scientists and engineers from leading academic institutions and from frontier AI labs.

Neuro-AI is our large-scale AI research program, pursuing a neuroscience-informed approach to engineering AGI. This is not yet-another-lab scaling LLMs in a hope of achieving general intelligence. We are integrating neuroscience, AI, and bioengineering to understand and digitally model the architecture of the human brain.

We are looking for a Site Reliability Engineer to own the digital infrastructure that powers our research.

This includes compute resources that we rent from third parties, container registries, and dashboards. The main objective is to make sharing these resources easy and efficient, ensuring the infrastructure is reliable and accessible to the right people.

This role spans a broad spectrum of activities:

  • Compute Access: Ensure easy and efficient access to compute resources for our researchers.

  • Resource Visibility: Provide clear visibility into resource utilization and cluster health.

  • Auto-Scaling: Enable automatic scaling of compute resources based on demand.

  • Access Management: Ensure the right people have access to the right resources.

  • Reproducibility: Drive towards deterministic deployments and reproducible research environments.

  • Process Automation: Automate operational processes where it makes sense to increase efficiency.

Current stack: Ansible, Kubernetes, Docker, Tailscale, Python, Grafana, Prometheus, and Talos Linux. We're not religious about any of it.

Requirements

~1 min read
  • Ownership: You are comfortable being the person accountable when the cluster is unhealthy or capacity is tight.

  • Systems Intuition: You understand how schedulers, containers, networking, storage, and hardware interact. You can reason about failure modes and design systems that degrade predictably.

  • Operational Rigor: You value observability, reproducibility, and clear operational boundaries. You leave systems in a state that other engineers can understand, operate, and debug without you.

  • Pragmatism: You can support experimental research workloads without forcing everything into a rigid "production" mold. You know when to stabilize and when to allow controlled chaos to speed up discovery.

  • This role is in-person in Emeryville, CA.

  • Visa sponsorship may be available for qualified candidates.

Location & Eligibility

Where is the job
Emeryville Hq
Hybrid — some on-site time required
Who can apply
Same as job location

Listing Details

Posted
January 29, 2026
First seen
May 5, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
42%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

asteraSite Reliability Engineer