Fal1mo ago
Senior/Staff Site Reliability Engineer
Turkeysenior
EngineeringDevOps & InfrastructureSenior
0 views0 saves0 applied
Quick Summary
Key Responsibilities
cluster lifecycle, upgrades, networking,
Technical Tools
EngineeringDevOps & InfrastructureSenior
You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems — from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better.
Responsibilities
~1 min read- →Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads
- →Build and maintain CI/CD pipelines and deployment infrastructure
- →Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability
- →Build dashboards, alerting, and anomaly detection across our systems
- →Define and enforce SLOs and build out incident response processes
- →Manage and improve our networking, load balancing, and service mesh configurations
- →Drive reliability improvements across the stack through automation, runbooks, and chaos engineering
Requirements
~1 min read- 5+ years experience in managing critical production systems and software development workflows
- Strong production experience setting up and operating Kubernetes at scale, using infrastructure-as-code (Terraform, Ansible)
- Deep knowledge of Linux networking, container networking (CNI plugins, VXLAN, BGP), and DNS
- Experience building CI/CD systems and GitOps workflows (FluxCD, ArgoCD)
- Proficiency in Python and either Go or Bash for tooling and automation
- Strong experience with logging, monitoring and alerting (Prometheus, Grafana, Loki, Thanos, VictoriaMetrics, Datadog)
- Excellent communication and ability to drive technical decisions across teams
- Self-starter who executes quickly, takes ownership, and constantly seeks improvement
Nice to Have
~1 min read- Experience with managing GPU and AI/ML workloads
- Experience with kernel-based monitoring and routing (eBPF, XDP)
- Experience with security tooling (Falco, Coroot, SIEM)
- Experience with bare metal Kubernetes networking (Calico, Cilium, MetalLB)
- Experience with distributed storage systems (Ceph, Longhorn, etc.)
-
Turkey
What We Offer
~1 min read✓Interesting and challenging work
✓A lot of learning and growth opportunities
✓Regular team events and offsites
Listing Details
- Posted
- March 13, 2026
- First seen
- March 26, 2026
- Last seen
- April 15, 2026
Posting Health
- Days active
- 19
- Repost count
- 0
- Trust Level
- 28%
- Scored at
- April 15, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trustcandidate experience
External application · ~5 min on Fal's site
Please let Fal know you found this job on Jobera.
Similar Senior/Staff Site Reliability Engineer jobs
Senior/Principal Machine Learning Infrastructure Engineer, Content Safety
USD 277170-343340
Senior/Principal AI Data and Feature Platform Engineer
USD 277170-343340
Senior-Staff Software Engineer, Engineering Productivity
USD 130000-280000
Senior-Staff Software Engineer, Tooling
USD 130000-280000
Senior-Staff Software Engineer, Platform Infrastructure
USD 130000-280000
Staff+ Software Engineer, Claude App Infrastructure
USD 320000-485000
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.