I
Ifm Us13mo ago
USD 150000–450000/yr

Distributed Machine Learning Engineer

United StatesSunnyvaleFull-timemid
Data ScienceMachine Learning EngineerDataData & AI
0 views0 saves0 applied

Quick Summary

Overview

About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research,

Technical Tools
Data ScienceMachine Learning EngineerDataData & AI
About the Institute of Foundation Models
We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.



The Role
The Distributed ML Engineer will play a role at the forefront of optimizing performance for the machine learning software stacks, especially at training and inference, and support the team to develop new and cutting-edge systems. The ideal candidate will have a strong background in parallel computing, and hands-on experience in system level coding, debug methodologies, and large-scale machine learning experience.
  • Understand, analyze, profile, optimize, and provide guidance to the team on deep learning workloads on state-of-the-art hardware and software platforms to improve their efficiency with different levels of optimization
  • Design and implement performance benchmarks and testing methodologies to evaluate application performance
  • Build tools to automate workload analysis, workload optimization, and other critical workflows
  • Triage system issues and identify bottleneck and inefficiencies by analyzing the sources of issues and the impact on hardware, network and propose solutions to enhance GPU utilization
  • Support the team to develop appropriate kernels and systems for new model architectures and algorithms
  • Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
  • Review code developed by other developers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).
  • Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.
  • Represent MBZUAI at industry conferences and events, showcasing the institution’s cutting-edge HPC and deep learning capabilities and establishing MBZUAI as a global leader in AI research and innovation.
  • Perform all other duties as reasonably directed by the line manager that are commensurate with these functional objectives.
  • Ph.D. in CS, EE or CSEE with 1+ years working experience, OR
  • Masters in CS, EE or CSEE or equivalent experience with 2+ year working experience
  • Listing Details

    Posted
    March 17, 2025
    First seen
    March 26, 2026
    Last seen
    April 25, 2026

    Posting Health

    Days active
    29
    Repost count
    0
    Trust Level
    42%
    Scored at
    April 25, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    I
    Distributed Machine Learning EngineerUSD 150000–450000