Palantir
Palantir73mo ago

Site Reliability Engineer - US Government

Washington, D.C.Full-timemid
EngineeringDevOps & InfrastructureSite Reliability EngineerDevops EngineerInfrastructure & Cloud
0 views0 saves0 applied

Quick Summary

Overview

A World-Changing Company Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it,

Technical Tools
EngineeringDevOps & InfrastructureSite Reliability EngineerDevops EngineerInfrastructure & Cloud
A World-Changing Company
 
Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role
 
We’re looking for Site Reliability Engineers who can help us build, operate, and maintain high-performance, scalable, and reliable services for our production infrastructure, across both cloud & on-prem environments. Site Reliability Engineers combine engineering experience and an innate drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whichever tools are best for the job. You’ll be the experts for the environments that you operate infrastructure in, helping partner teams build & configure their software to operate reliably within.
 
We strongly believe in engineering teams being responsible for the operations of their services in production. In this role, you’ll work closely with engineers to advocate and participate in sensible, scalable, systems design and share responsibility with them in diagnosing, resolving, and preventing production issues.
  • Maintaining availability of cloud & physical Linux servers that power the Palantir platform in air-gapped production environments
  • Design, deploy, and operate infrastructure to support customer & product requirements via modern orchestration & monitoring platforms.
  • Collaborate closely with product teams on requirements & SLOs for deploying software into air-gapped environments.
  • Identifying, troubleshooting, and solving network & systems issues
  • Scripting to automate away routine operational tasks
  • Provide technical troubleshooting support for production issues, ensuring timely resolution and minimal impact on operations. Participate in a support on-call schedule
  • Confidence in troubleshooting complex systems issues independently using stack traces and observability & systems tools
  • Comfort with managing large scale production systems and technologies with configuration management, load balancing, monitoring & alerting infrastructure, and container orchestration
  • Demonstrated ability to continuously learn and work independently, making decisions with minimal supervision while working in secure facilities
  • Experience with containers (Docker/Podman) and orchestration (OpenShift/Kubernetes) at scale is a plus
  • Preferred Certifications: DOD 8570 IAT Level II or greater (CISSP, Sec+), Unix/Linux Computing Environment (e.g Linux+, RHCE)
  • Active security clearance
  • 4+ years of experience with Linux system administration (RHEL or equivalent preferred)
  • Experience with cloud-based hosting platforms like AWS, Azure, or GCP and/or experience with hardware-based environments
  • Familiarity with monitoring systems using tools like Prometheus and writing health checks
  • Proficiency with at least one programming language, such as Java, Go, Python, JavaScript, Bash, or similar languages.
  • Strong engineering background, preferred in fields such as Computer Science, Mathematics, Software Engineering, Physics, and Data Science
  • Location & Eligibility

    Where is the job
    Location terms not specified
    Who can apply
    Same as job location
    Listed under
    Worldwide

    Listing Details

    Posted
    April 22, 2020
    First seen
    April 2, 2026
    Last seen
    April 28, 2026

    Posting Health

    Days active
    25
    Repost count
    0
    Trust Level
    33%
    Scored at
    April 28, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Palantir

    We build software that empowers organizations to effectively integrate their data, decisions, and operations.

    Employees
    4k+
    Founded
    2003
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    PalantirSite Reliability Engineer - US Government