cantina3mo ago

Machine Learning Engineer, TTS

EuropeRemotefull-timemid

OtherMachine Learning EngineerDataMl Research Engineer

2 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

Model Building: Architect, implement, pre-train, fine-tune, and post-train/alignment (e.g., GRPO/DPO) for large-scale speech models.

Requirements Summary

Design automated objective/subjective evaluations—listening tests, SV/WER/ASR-based metrics, robustness & bias checks, and red-team studies.

Technical Tools

OtherMachine Learning EngineerDataMl Research Engineer

Cantina is a new social platform founded by Sean Parker with the most advanced AI character creator. Our bots are lifelike, social creatures that can interact wherever people are online—across voice, video, and text. Create yourself, imagine someone new, or choose from thousands of characters to share infinitely scalable, personalized content and seamless group chat.

If you’re excited about how AI can shape creativity and social interaction, come help us build what’s next.

About the Role

~1 min read

We’re looking for a Research / ML Engineer to join our Speech Team to build state-of-the-art speech systems end-to-end—from data specs through production inference. You’ll drive the model ↔ data ↔ eval flywheel for TTS and adjacent tasks (voice cloning, controllable TTS, voice conversion and more), partnering closely with research, data, and infra to ship fast, reliable, and cost-aware models. In this role, you will work at the intersection of cutting-edge research and practical engineering, contributing to the development of safe, steerable, and trustworthy AI systems.

Responsibilities

~1 min read

→
- →
  Exceptional research/development experience with large scale audio models (>3B models and >500k hours data).
- →
  Exceptional understanding and hands-on experience with transformer architectures and/or diffusion models (inc. distillation and streaming) and/or audio language modelling.
- →
  Strong experience with multi-node and multi-gpu distributed model training.
- →
  Strong software engineering skills with a proven track record of building complex systems
- →
  Strong with PyTorch and performance work (profiling, CUDA/Triton/C++ as needed) and writing reliable production quality code.
- →
  Shipped large scale speech/audio models to production.
- →
  Background in working with large-scale ML data.
- →
  Ability to iterate on data,, and triangulate quality using subjective and objective signals.
- →
  Notable publications and/or open source contributions in speech/audio/ML.
- →
  Experience with voice-cloning, speech-control, voice-generation.

Nice to Have

~1 min read

Shipped large scale speech/audio models (TTS/VC/ASR) to production.
Work on large-scale ML systems.
Experience with audio language modelling, transformer architectures.
Experience with voice-cloning, speech-control, voice-generation.
Background in processing large-scale ML data.
Publications or notable open-source in speech/audio/ML.

What We Offer

~1 min read

The anticipated annual base salary range for this role is between $200,000-$220,000 (€170,000-€190,000). When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.

✓Competitive salary and generous company equity

✓Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina

✓42 days of paid time off, including:15 PTO days

✓10 sick days

✓15 company holidays

✓2 floating holidays

✓Generous parental leave & fertility support

✓401(k) retirement savings plan

✓Lifestyle spending account – $500/month to use however you’d like

✓Complimentary lunch and snacks for in-office employees

✓One Medical membership, and more!