Infrastructure Engineer, Distributed Compute

United States·Austinfull-timemid

EngineeringDevops Engineer

2 views0 saves0 applied

Apply Now

Quick Summary

Overview

About Base Base is America’s next-generation power company.

Technical Tools

EngineeringDevops Engineer

Base is America’s next-generation power company. We’re rebuilding the foundation of modern civilization–electricity–by deploying a vast network of distributed batteries that is transforming today’s fragile, centralized grid into a resilient and abundant system. We are engineers, operators, and creatives solving some of the most complex, interdisciplinary challenges of our time.

About the Role

~1 min read

Base is deploying thousands of computing nodes across the country, coordinating them as a single distributed system. We're looking for an Infrastructure Engineer to design, build, and operate the horizontal infrastructure that coordinates, orchestrates, and manages this distributed compute network — enabling device communication, task scheduling, state synchronization, and fleet management at scale.

You'll own the backend systems and APIs that allow thousands of devices to reliably communicate with central infrastructure, track their state, receive updates, and execute coordinated commands. This is systems-level work: designing for failure, scale, cost efficiency, and operational simplicity.

You'll work closely with device engineers who need reliable communication channels, product teams who need fleet management primitives, operations teams who need visibility and control, and hardware engineers who understand physical constraints. Your infrastructure is the nervous system of this product — it must be fast, reliable, and elegant.

Responsibilities

~1 min read

→
Design and build the core orchestration and coordination layer that manages device fleet operations — task distribution, state synchronization, health monitoring — with >99.9% availability.
→
Build backend systems that reliably handle device-to-cloud communication at scale, including message routing, acknowledgment, retry logic, and conflict resolution for concurrent updates.
→
Develop APIs and services that allow product teams to query device state, push updates, and execute commands on thousands of devices simultaneously without bottlenecks or data consistency issues.
→
Design architectures that scale horizontally from hundreds to millions of devices without re-architecture, while optimizing compute, storage, and network costs.
→
Implement monitoring, alerting, and operational runbooks that allow the team to understand and troubleshoot distributed system behavior in production.
→
Build reliable async communication patterns using message queues and event streaming, handling ordering guarantees, deduplication, and exactly-once semantics.
→
Own the database and storage layer decisions that support both operational and analytical workloads — knowing when to use relational databases, NoSQL stores, or specialized systems.
→
Partner with hardware and device teams to understand their needs and translate them into scalable, reliable backend services.
→
Write infrastructure-as-code that is maintainable, tested, and reproducible, enabling safe and rapid iteration.

5+ years building backend infrastructure or distributed systems, preferably at scale
Strong experience in Go, Python, Java, or equivalent backend languages
Deep understanding of distributed systems concepts: eventual consistency, state synchronization, failure handling
Experience building APIs and services that handle high scale and high concurrency
Familiarity with message queues or event streaming (Kafka, RabbitMQ, SQS, or similar)
Solid understanding of databases and data modeling — knowing when to use relational vs. NoSQL vs. specialized stores
Comfort with infrastructure-as-code and cloud platforms (AWS or GCP)
Proven ability to own complex systems end-to-end: design, implementation, deployment, and operational support
Nice-to-Haves:
Experience building device management or IoT backend systems
Familiarity with Kubernetes and container orchestration
Background in energy, utilities, or other operational technology (OT) domains
Experience with distributed tracing and observability at scale (Datadog, Honeycomb, etc.)
Knowledge of fleet management, device provisioning, or OTA update systems
Exposure to consensus algorithms (Raft, Paxos) or distributed coordination (etcd, Zookeeper)
Experience with stream processing frameworks (Kafka Streams, Flink, etc.)
Experience operating systems in production with clear operational runbooks and runbook discipline
Experience with data center orchestration systems and baseboard management controllers

Infrastructure for distributed systems is fundamentally hard: you're coordinating thousands of independent devices that may be offline, on poor networks, or running old software versions. One mistake cascades everywhere. This team solves that by building systems that are robust enough to handle the messiness of reality — devices rebooting, network failures, clock skew, partial failures — and simple enough that the rest of the company can build on them confidently. You'll see the impact immediately: infrastructure issues directly affect whether members stay powered or lose power. This is work that matters to grid resilience and American energy independence.

Please note: Base is a startup, which means priorities shift and evolve quickly. Your role may expand or change based on the needs of the business at any given time, so the responsibilities listed may not be exhaustive.

First Principles Thinking: Question assumptions. Principles > rules.
Operate at Base Pace: Focus on what matters, act quickly, and learn by doing.
Give & Get Feedback: Be direct, be humble, and maintain a growth mindset.
Everyone’s an Owner: Follow through on commitments and own results.
Strong Opinions, Loosely Held: Drive clarity and make calls with imperfect information.
Committed to the Mission: Rebuilding the grid is a big challenge. We work hard because we care deeply about the impact we’re creating. We work in-person. It’s not a 9-to-5. We are all-in.
Fun & Optimism Coexist with Grit: Collaboration and celebration coincide with the intensity of building real things.