docker
docker1mo ago
New

Software Engineer, Infrastructure Platform

United StatesUnited StatesRemotefull-timemid
Software EngineerSoftware Engineering
0 views0 saves0 applied

Quick Summary

Overview

Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls.

Key Responsibilities

1) Self-Service Platform Services Build and operate internal platform services and APIs in Go, including provisioning, quotas and policies, cost insights, and platform workflows.

Requirements Summary

Core Engineering Skills (must-have) 4+ years of backend software engineering experience building large-scale cloud or distributed systems Strong software development skills in Go or a similar language, including design, testing, debugging, and code…

Technical Tools
argocddockergithub-actionsgografanakubernetesprometheusterraformci-cdcode-reviewdistributed-systemslinuxnetworking

Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. From solo founders to the world's largest companies, developers rely on Docker to build, share, and run their applications across our suite of products including Docker Desktop, Docker Hub, and Docker Scout.

We are a globally distributed, remote-first team building the tools that define how software gets built and delivered. As AI agents redefine software development, Docker is at the center of that shift, providing the sandboxed environments, verified images, and secure infrastructure that make autonomous workflows trustworthy by default.

Our Infrastructure Engineering team builds and operates the cloud-native platform that powers Docker’s suite of products. We design resilient services, automate where it helps most, and measure what matters so hundreds of engineers can ship safely to millions of users every day.

A core focus is self-service. We build paved-road platform capabilities that let internal teams provision, deploy, observe, and operate services with minimal friction and strong guardrails. We treat the platform as a product with clear contracts, well-defined defaults, and great documentation. Success is measured by adoption and fewer support requests.

Responsibilities

~1 min read
  • Build and operate internal platform services and APIs in Go, including provisioning, quotas and policies, cost insights, and platform workflows.

  • Deliver golden paths for self-serve onboarding and day-2 operations, including access, deployment setup, observability defaults, and governance guardrails.

  • Partner with teams to drive adoption through clear docs, examples, and measurable outcomes.

  • Codify infrastructure with Terraform and GitOps practices, and contribute to platform tooling in Go.

  • Define and improve SLOs, alerting, and operational readiness. Participate in incident response and preventive follow-ups.

  • Help standardize safe delivery patterns, including testing gates, canaries, and rollback triggers, so deployments are routine and low-risk.

  • Operate and scale multi-tenant EKS clusters and traffic and ingress systems to deliver secure, reliable routing.

  • Evaluate and adopt improvements with a bias toward incremental rollout and measurable impact.

  • Build and iterate on agentic workflows that reduce operational toil, including triage support, context gathering, safe runbook execution, and remediation suggestions.

  • Integrate automation into delivery and operations in a way that is safe, observable, and auditable.

Operational ownership is part of this role.

  • You’ll join an on-call rotation after onboarding and shadowing, and participate in incident response during your shifts.

  • We aim for sustainable on-call through good alerting, automation, and blameless postmortems focused on prevention.

Requirements

~1 min read
  • 4+ years of backend software engineering experience building large-scale cloud or distributed systems

  • Strong software development skills in Go or a similar language, including design, testing, debugging, and code review.

  • Experience shipping and operating cloud services in production, often 3+ years. We hire for skill and impact, not years alone.

  • Solid foundation in Linux, networking fundamentals, and cloud security.

  • Experience building operational automation, including AI-assisted or agentic workflows, with an emphasis on safety, guardrails, and auditability.

  • Clear written and verbal communication in a remote environment, including RFCs, incident writeups, and async collaboration.

  • Kubernetes and EKS experience, plus ingress, CNI, service mesh, and familiarity with L4 and L7 load balancing.

  • Observability tooling such as OpenTelemetry, Prometheus, and Grafana, plus alerting and SLO practice.

  • CI/CD and progressive delivery, including GitHub Actions or Argo CD, canaries, and automated rollback.

  • Cost optimization at scale, including FinOps and capacity modeling.

  • Distributed systems, containers, and Go-based platform tooling.

We value depth in one area and curiosity across others, and we will help you grow in the rest.

  • Ship your first change to a Terraform module or internal service and learn how we operate.

  • Shadow on-call and build context on our platform and reliability priorities.

  • Own a component and deliver an improvement from design to production with measurable impact.

  • Join the on-call rotation and contribute effectively during your shifts.

  • Lead or co-lead a meaningful platform initiative, with scope that scales by level, and help reduce toil through automation.

  • Become a trusted contributor in one or more areas such as platform services, Kubernetes and networking foundations, or reliability automation.

We use Covey as part of our hiring and / or promotional process for jobs in NYC and certain features may qualify it as an AEDT. As part of the evaluation process we provide Covey with job requirements and candidate submitted applications. We began using Covey Scout for Inbound on April 13, 2024.

Please see the independent bias audit report covering our use of Covey here.

What We Offer

~1 min read
Freedom & flexibility; fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup; we want you comfortable while you work
16 weeks of paid Parental leave
Technology stipend equivalent to $100 net/month
PTO plan that encourages you to take time to do the things you enjoy
Training stipend for conferences, courses and classes
Equity; we are a growing start-up and want all employees to have a share in the success of the company
Docker Swag
Medical benefits, retirement and holidays vary by country
Remote-first culture, with offices in Seattle and Paris

Location & Eligibility

Where is the job
United States
Remote within one country
Who can apply
Open to applicants worldwide

Listing Details

Posted
April 3, 2026
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
23%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

dockerSoftware Engineer, Infrastructure Platform