cartesia4mo ago

Research Engineer, Data (India)

India·Bangalorefull-timemid

OtherResearch Engineer

1 views0 saves0 applied

Apply Now

Quick Summary

Overview

About Cartesia Our mission is to architect AI that learns from and interacts with the world like humans do. We're pioneering the model architectures that will make this possible.

Key Responsibilities

Design and build large-scale datasets for model training, and run controlled modeling experiments to measure their impact on model performance and behavior.

Requirements Summary

Experience building or working with large multilingual datasets Experience with generative models (speech, text, or multimodal). Ability to help guide human annotation and evaluation across multiple languages.

Technical Tools

OtherResearch Engineer

Our mission is to architect AI that learns from and interacts with the world like humans do.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

About the Role

~1 min read

Data is one of the most critical inputs to frontier AI. In this role, you'll work across the full data stack - from infrastructure and web-scale data pipelines to modeling and evaluation. You'll develop a deep understanding of how data influences model performance and leverage that knowledge to advance the frontier of multimodal datasets to create new capabilities for AI.

Design and build high-quality datasets for model training and run controlled modeling experiments to measure their impact on model performance and behavior.
Engineer web-scale data pipelines and build systems to annotate and ensure data quality at scale.
Develop techniques for post-training and synthetic data generation to improve model quality and intelligence.

Experience building or working with large multilingual datasets
Experience with generative models (speech, text, or multimodal).
Ability to help guide human annotation and evaluation across multiple languages.
Strong applied ML background with a focus on data-centric approaches.
Excitement for building scalable systems that bridge research and production.

🏢 In-office policy: We’re an in-person team based out of offices in 🇺🇸 San Francisco, 🇬🇧 London and 🇮🇳 Bangalore. We love being in the office, hanging out together, and learning from each other every day.

What We Offer

~1 min read

🚆 Commuter Allowance A monthly stipend to help you get to and from the office.

🏖️ Flexible PTO Take as much time as you need to recharge your batteries.

🍲 Meals & Snacks Lunch, dinner and plenty of snacks, provided daily.

🦖 Your own personal Yoshi

Cartesia is an equal opportunity employer. We consider qualified applicants without regard to race, color, religion, sex, national origin, age, disability, veteran status, genetic information, or any other legally protected status.