Zoox
Zoox1mo ago
USD 189000–258000/yr

Machine Learning Engineer - Multi-Modality Foundation Model

Data ScienceMachine Learning EngineerDataData & AI
1 views0 saves0 applied

Quick Summary

Overview

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models.

Technical Tools
cpppytorchmachine-learning
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence.  As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models. We are looking for experts who have hands-on experience building multi-modality foundation models—whether that involves AV-centric modalities (Vision, LiDAR, Radar) or broader domains (Vision, Language, Text, Audio). You will design, train, and deploy these models using Knowledge Distillation (KD) to transfer capabilities from large-scale proprietary teacher models to efficient student models capable of real-time, on-vehicle inference.
  • Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio).

  • Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle.

  • Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models.

  • Build high-quality training and evaluation datasets, applying advanced data-centric techniques to maximize cross-modal representation learning and student model convergence.

  • Collaborate with downstream perception teams to integrate and validate the performance, robustness, and latency of your models in on-board production systems.

  • MS or PhD in Computer Science, Machine Learning, or a related technical field with demonstrated professional experience.

  • Deep, proven expertise in building and training large-scale multi-modality foundation models (e.g., Vision-Language Models (VLMs), Vision-Audio-Text, or Vision-LiDAR-Radar architectures).

  • Strong understanding of cross-modal alignment, multi-modal attention mechanisms, and large-scale pre-training techniques.

  • Proven experience in Knowledge Distillation (KD), model compression, and training highly efficient student models for production environments.

  • Proficiency in ML frameworks (e.g., PyTorch) and experience building large-scale ML training and evaluation pipelines.

  • Experience in the Autonomous Driving or robotics industry.

  • Experience with model deployment, optimization, and hardware constraints (e.g., C++ for inference, TensorRT, quantization, pruning).

  • Publications in top-tier conferences (CVPR, ICCV, NeurIPS, ICLR, ACL) related to multi-modality foundation models, cross-modal learning, or model compression.

  • Location & Eligibility

    Where is the job
    Foster City, United States
    Hybrid — some on-site time required
    Who can apply
    US
    Listed under
    United States

    Listing Details

    Posted
    March 12, 2026
    First seen
    March 24, 2026
    Last seen
    May 5, 2026

    Posting Health

    Days active
    43
    Repost count
    0
    Trust Level
    44%
    Scored at
    May 6, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Zoox
    Zoox
    lever

    Zoox, a subsidiary of Amazon, designs fully autonomous vehicles focusing on making urban transportation safer and more efficient.

    Employees
    3k+
    Founded
    2014
    Domain
    zoox.com
    View company profile
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    ZooxMachine Learning Engineer - Multi-Modality Foundation ModelUSD 189000–258000