alephalpha
alephalpha3mo ago

Senior Performance Engineer- Pre-training(f/m/d)

GermanyGermany·Heidelbergfull-timesenior
OtherPerformance Engineer
3 views0 saves0 applied

Quick Summary

Overview

Our Mission Aleph Alpha is one of the few companies in Europe doing serious foundation model pre-training. Our customers - in finance, manufacturing, public administration - need models that understand German, meet European regulatory requirements, and work reliably in high-stakes settings.

Requirements Summary

Contributions to modern distributed training frameworks (e.g., TorchTitan, Megatron-LM, DeepSpeed). Familiarity with low-precision training formats (MXFP4, MXFP8) and their impact on numerical stability and throughput.

Technical Tools
pythonpytorchdeep-learningdistributed-systemsmachine-learning

Aleph Alpha is one of the few companies in Europe doing serious foundation model pre-training. Our customers - in finance, manufacturing, public administration - need models that understand German, meet European regulatory requirements, and work reliably in high-stakes settings. We're building that in Heidelberg.

We are hiring a Performance Engineer to grow our pre-training efficiency team. If you are excited about making models fast, this is the role for you!

At Aleph Alpha, we foster a culture built on ownership, autonomy, and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organizational structure with efficient, supportive management that enables quick decision‑making, open communication, and a strong sense of shared purpose.

About the Role

~1 min read

You will engineer the systems required to train foundation models at scale. Your objective is to maximize hardware utilization and training throughput on our large-scale GPU clusters (thousands of NVIDIA Blackwell GPUs). You will work at the intersection of deep learning frameworks, distributed systems, and GPU microarchitecture, eliminating bottlenecks from the Python layer down to the GPU kernel.

This role is for Aleph Alpha Research GmbH.

Responsibilities

~1 min read

Requirements

~1 min read
  • Are proficient in Python and the PyTorch library.

  • Have a strong engineering background in parallel and/or distributed systems with proven track record of excellence.

  • Have hands-on experience with modern machine learning techniques (especially large language models and their life cycle).

  • Deeply understand the CUDA programming model.

  • Have experience in distributed programming with APIs like NCCL or MPI.

  • Have experience analysing profiling traces with tools such as PyTorch Profiler and Nvidia Nsight.

  • Please note this role requires regular on-site collaboration in Heidelberg as a member of the Training Efficiency Team.

  • Contributions to modern distributed training frameworks (e.g., TorchTitan, Megatron-LM, DeepSpeed).

  • Familiarity with low-precision training formats (MXFP4, MXFP8) and their impact on numerical stability and throughput.

  • A deep understanding of NCCL communication primitives, NVSHMEM or CUDA IPC and their performance.

  • A proven track record of implementing and optimising modern transformer-based model training.

  • A proven track record working on the NVIDIA Blackwell architecture.

What We Offer

~1 min read
Become part of an AI revolution!
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Virtual Stock Option Plan
JobRad® Bike Lease

Location & Eligibility

Where is the job
Heidelberg, Germany
Hybrid — some on-site time required
Who can apply
DE

Listing Details

Posted
February 26, 2026
First seen
May 6, 2026
Last seen
May 30, 2026

Posting Health

Days active
24
Repost count
0
Trust Level
18%
Scored at
May 30, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

alephalphaSenior Performance Engineer- Pre-training(f/m/d)