alephalpha3mo ago

Senior AI R&D Engineer- Model Evaluation (f/m/d/)

Germany·Heidelbergfull-timesenior

EngineeringOtherSoftware EngineerAi Software EngineerSoftware EngineeringR&D Engineer

1 views0 saves0 applied

Apply Now

Quick Summary

Overview

Aleph Alpha Research’s mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications.

Requirements Summary

Understanding of foundation model training - how data, scale, and architecture affect capabilities. Experience with large-scale data processing or ML infrastructure.

Technical Tools

pythonpytorchdistributed-systemsmachine-learning

Aleph Alpha Research’s mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and next-generation methods that make it easy and affordable for Aleph Alpha’s customers to increase productivity in development, engineering, logistics, and manufacturing processes.

At Aleph Alpha, we foster a culture built on ownership, au tonomy, and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organizational structure with efficient, supportive management that enables quick decision‑making, open communication, and a strong sense of shared purpose.

About the Role

~1 min read

As a Senior AI R&D Engineer- Model Evaluation (f/m/d), you will work in the pre-training evaluations team. Our mission is to give meaningful signals during pre-training runs and provide additional metrics to other teams to make informed decisions (ablations).

Responsibilities

~1 min read

We are a mix of researchers and engineers, and you will support our engineering efforts. Major points include improving the testability of our code through design and architecture changes, and lowering the time it takes for an end-to-end integration of a new benchmark. You drive these changes through incremental, hands-on modifications of our code. Simultaneously, you are expected to work on smaller day-to-day tasks, e.g., maintain our repositories, investigate a spurious benchmark result, or iron out an out-of-memory error.

No two days are the same. Things move fast, and your ability to focus and prioritize is what lets you unblock the team day-to-day while designing the tooling and automation that speeds us up long-term. You will have real influence on what gets built and how. Your work directly shapes how quickly we can experiment and improve our models.

Requirements

~2 min read

Capable, driven and open individual that thrives in a dynamic environment: LLMs are rapidly evolving, and we maintain flat hierarchies and the possibility to make an impact across a wide range of areas. Hence - above all - we are looking for highly talented individuals that thrive in such an environment. You should add something unique that helps our efforts, but nobody needs to tick a long list of boxes.
Willingness to relocate to Germany. Our primary working locations are Heidelberg and Berlin. We foster an on-site culture with direct communication and collaboration. As such, you should be on-site at your main work location at least two days a week. If you choose Berlin, you should be willing to travel to Heidelberg (our headquarters) every one to two months for a few days.

Software engineer with ability to write code that other strong engineers want to build on.
Ability to incrementally convert a code-base with accumulated complexity into a more testable and explainable state.
Explainer: A lot of decisions we make together. Communicating and convincing the team of your ideas is pivotal skill.
Taking initiatives to drive and deliver high-impact work
Degree in computer science, engineering, or a related field.
Strong Python skills, and experience with the lower languages such as Rust, C++, etc.
Experience with infrastructure tooling and container orchestration such as docker, Kubernetes, infrastructure as code etc.
Deep interest in and willingness to learn about LLM training.

(We encourage you to apply even if you don't check every box!)

Experience working with distributed systems.

Experience with LLM evaluation, benchmark design or evaluation dataset curation.
Understanding of foundation model training: how data, scale, and architecture affect capabilities.
Familiarity with statistical methods.