arena
arena4mo ago
New

Site Reliability Engineer, Platform

Bay Areafull-timemid
EngineeringDevops Engineer
0 views0 saves0 applied

Quick Summary

Overview

About Arena Intelligence Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use.

Technical Tools
awsdatadogdiscordgithub-actionsgografanajavascriptnextjspostgresqlprometheuspulumipythonterraformtypescriptvercelci-cdoauth

Arena Intelligence is the open platform for evaluating how AI models perform in the real world. Created by researchers from UC Berkeley’s SkyLab, our mission is to measure and advance the frontier of AI for real-world use.

Millions of people use Arena Intelligence each month to explore how frontier systems perform — and we use our community’s feedback to build transparent, rigorous, and human-centered model evaluations. Leading enterprises and AI labs rely on our evaluations to understand real-world reliability, alignment, and impact. Our leaderboards are the gold standard for AI performance — trusted by leaders across the AI community and shaping the global conversation on model reliability and progress.

We’re a team of researchers, engineers, academics, and builders from places like UC Berkeley, Google, Stanford, DeepMind, and Discord. We seek truth, move fast, and value craftsmanship, curiosity, and impact over hierarchy. We’re building a company where thoughtful, curious people from all backgrounds can do their best work. Everyone on our team is a deep expert in their field — our office radiates excellence, energy, and focus.

About the Role

~1 min read

Arena Intelligence is seeking a Site Reliability Engineer to own the reliability, performance, and operational security of the platform that millions of people depend on to evaluate frontier AI. This is the first dedicated SRE hire on the team — you'll build observability, incident response, and infrastructure hardening practices from scratch while also owning the CI/CD and developer tooling that keeps our engineering team moving fast.

Our stack runs on Vercel (Next.js, Hono API on Nitro), Supabase (Postgres, GoTrue auth), Cloudflare (Workers, R2, bot management), and AWS (CloudFront, Lambda). You'll work across the full request path — from edge-layer DDoS mitigation to auth hardening to production monitoring — partnering closely with security and product engineering to keep the platform fast, reliable, and resilient under adversarial traffic conditions.

  • Harden auth infrastructure against volumetric attacks — edge-layer rate limiting in front of Supabase GoTrue, connection pool tuning, token caching, and origin shielding so DDoS traffic is filtered before it reaches the database

  • Extend CloudFront WAF rules and Cloudflare Worker bot management to cover auth endpoints and close gaps in application-layer rate limiting

  • Define and implement SLOs/SLIs across the full request path — CDN edge through serverless functions to Supabase

  • Build monitoring, alerting, and dashboards on top of existing Datadog and PostHog instrumentation that surface degradations before users notice them

  • Collaborate with security engineering to ensure clean handoff between edge-layer defenses and application-layer anti-abuse systems

  • Own and improve CI/CD pipelines (GitHub Actions, Turborepo) and expand infrastructure-as-code (Terraform) across cloud environments

  • Proactively load-test and stress-test infrastructure, model capacity limits, and drive cost optimization across our multi-cloud footprint

  • Enhance developer workflows to make building, testing, and deploying faster and more reliable

  • Mentor engineers across the company on building reliable, performant, and observable systems

  • 6+ years of experience in SRE, platform engineering, or infrastructure engineering, including operating production systems at scale (millions of users / billions of requests)

  • Direct experience mitigating DDoS attacks and configuring edge security — WAF rules, CDN architecture, rate limiting, and traffic analysis

  • Hands-on experience building observability systems (Datadog, Grafana, Prometheus, or similar) and running incident response processes

  • Strong understanding of auth infrastructure under adversarial load — connection pooling, token caching, and rate limiting on login/signup endpoints

  • Experience with serverless architectures and managed platforms — you know how to make them reliable and observable at scale

  • Experience with infrastructure-as-code (Terraform, Pulumi) and CI/CD pipeline design

  • Track record of collaborating with security and product engineering to deliver both foundational systems and user-facing reliability improvements

Nice to Have

~1 min read
  • Experience with Vercel, Supabase (GoTrue, Supavisor), Cloudflare Workers, or CloudFront specifically.

  • Experience with Node.js, TypeScript, Python, or Go in production backend environments.

  • Background in platforms with voting, reputation, or community-driven systems.

  • Experience being the first or early infrastructure hire at a startup.

  • Experience hardening auth systems under load (OAuth, JWT, PKCE flows, connection pooling).

What We Offer

~1 min read
We offer competitive compensation and equity aligned to the markets where our team members are based. The base salary range will depend on the candidate’s permanent work location.
Comprehensive health and wellness benefits, including medical, dental, vision, and additional support programs.
The opportunity to work on cutting-edge AI with a small, mission-driven team
A culture that values transparency, trust, and community impact

Location & Eligibility

Where is the job
Bay Area
Hybrid — some on-site time required
Who can apply
Same as job location

Listing Details

Posted
December 18, 2025
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
16%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

arenaSite Reliability Engineer, Platform