Inference Software Engineer
Quick Summary
About Etched Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200.
Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
Etched is building the world’s first AI inference system purpose-built for transformers - delivering over 10x higher performance and dramatically lower cost and latency than a B200. With Etched ASICs, you can build products that would be impossible with GPUs, like real-time video generation models and extremely deep & parallel chain-of-thought reasoning agents. Backed by hundreds of millions from top-tier investors and staffed by leading engineers, Etched is redefining the infrastructure layer for the fastest growing industry in history.
Responsibilities
~1 min read- →
Support porting state-of-the-art models to our architecture. Help build programming abstractions and testing capabilities to rapidly iterate on model porting.
- →
Build, enhance, and scale Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
- →
Optimize routing and communication layers using Sohu’s collectives.
- →
Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
Proficiency in C++ or Rust.
Understanding of performance-sensitive or complex distributed software systems like Linux internals, accelerator architectures (e.g. GPUs, TPUs), Compilers, or high-speed interconnects (e.g. NVLink, InfiniBand).
Familiarity with PyTorch or JAX.
Ported applications to non-standard accelerator hardware or hardware platforms.
Requirements
~1 min readDeveloped low-latency, high-performance applications using both kernel-level and user-space networking stacks.
Deep understanding of distributed systems concepts, algorithms, and challenges, including consensus protocols, consistency models, and communication patterns.
Solid grasp of Transformer architectures, particularly Mixture-of-Experts (MoE).
Built applications with extensive SIMD (Single Instruction, Multiple Data) optimizations for performance-critical paths.
What We Offer
~1 min readEtched believes in the Bitter Lesson. We think most of the progress in the AI field has come from using more FLOPs to train and run models, and the best way to get more FLOPs is to build model-specific hardware. Larger and larger training runs encourage companies to consolidate around fewer model architectures, which creates a market for single-model ASICs.
We are a fully in-person team in San Jose (Santana Row), and greatly value engineering skills. We do not have boundaries between engineering and research, and we expect all of our technical staff to contribute to both as needed.
Location & Eligibility
Listing Details
- Posted
- June 17, 2025
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 14%
- Scored at
- May 6, 2026
Signal breakdown
Please let etched know you found this job on Jobera.
Similar Software Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.