I
Ifm Us28d ago
USD 150000–300000/yr
High Performance Computing Software Engineer - Supercomputing
Software EngineerSoftware Engineering
1 views0 saves0 applied
Quick Summary
Overview
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research,
Technical Tools
Software EngineerSoftware Engineering
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.
The Role
IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large-scale AI workloads.
In this role, you’ll work at the intersection of high-performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting-edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer-friendly.
Responsibilities
~1 min read- →Design and implement high-performance, distributed software solutions for large-scale AI/ML training.
- →Optimize low-level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
- →Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA-based systems.
- →Partner with ML researchers and engineers to support frameworks like PyTorch, MegatronLM, and DeepSpeed in large-scale production environments.
- →Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
- →Debug and resolve complex issues across the stack—from kernel to container to model.
- →Work closely with hardware vendors, upstream open-source communities, and internal teams to drive performance and reliability improvements.
Requirements
~1 min read- Proven experience developing and optimizing software for large-scale ML workloads (1000+ GPUs preferred).
- Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
- Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
- Experience with ML frameworks like PyTorch, TensorFlow, JAX, or MegatronLM.
- Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
- Excellent debugging and systems performance tuning skills.
- A collaborative mindset with a focus on shared success and technical excellence.
Location & Eligibility
Where is the job
Sunnyvale, United States
On-site at the office
Who can apply
US
Listed under
United States
Listing Details
- Posted
- April 3, 2026
- First seen
- April 4, 2026
- Last seen
- May 1, 2026
Posting Health
- Days active
- 27
- Repost count
- 0
- Trust Level
- 42%
- Scored at
- May 1, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trust
Salary
USD 150000–300000
per year
External application · ~5 min on Ifm Us's site
Please let Ifm Us know you found this job on Jobera.
3 other jobs at Ifm Us
View all →Explore open roles at Ifm Us.
Similar Software Engineer jobs
View all →Software Engineer
Software Engineer, Simulations (Starlink)
$125k–$145k/yr
L
LuxurypresenceRemoteStaff Software Engineer - AI Website Builder - CANADA (Remote)
Full-timeRemote
L
LuxurypresenceRemoteStaff Software Engineer - AI Website Builder - US (Remote)
USD 200000–250000
Full-timeRemote
Deployed Software Engineer
$134k–$168k/yr
Remote
S
SecurecodewarriorRemoteSenior Software Engineer, Native Applications and Networking (3 Month Contract)
ContractorRemote
Browse Similar Jobs
Solutions Architect628Full Stack Developer178Embedded Software Engineer122Java Developer75Security Software Engineer74Search Engineer69Firmware Engineer66Salesforce Developer66Cloud Platform Software Engineer59Build Engineer53Low-Code Developer49Data Platform Software Engineer44Python Developer44Web Developer38Robotics Software Engineer38Application Developer34C++ Developer33Android Developer33Release Engineer29Java Software Engineer27
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.
I
High Performance Computing Software Engineer - SupercomputingUSD 150000–300000