Machine Learning Engineer — Multilingual Data
Quick Summary
We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement.
Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME) Exposure to LLM training, fine-tuning, or distillation Linguistics background or experience working with native language experts Contributions to open-source…
We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.
This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.
Responsibilities
~1 min read- →
Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
- →
Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
- →
Implement quality filters using statistical, heuristic, and model-based methods
- →
Work with researchers to define language coverage, benchmarks, and evaluation metrics
- →
Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
- →
Support training, fine-tuning, and distillation workflows with high-quality multilingual data
- →
Continuously iterate on datasets based on model performance and real-world usage
3+ years of experience as an ML Engineer, Applied Scientist, or similar role
Strong experience working with multilingual or non-English datasets
Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
Experience building scalable data pipelines (Python, Spark, Ray, or similar)
Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
Comfort collaborating with researchers and translating research needs into production systems
Nice to Have
~1 min readExperience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)
Exposure to LLM training, fine-tuning, or distillation
Linguistics background or experience working with native language experts
Contributions to open-source datasets or ML tooling
Experience with data quality evaluation at scale
What We Offer
~1 min readLocation & Eligibility
Listing Details
- Posted
- January 22, 2026
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 23%
- Scored at
- May 6, 2026
Signal breakdown
Please let featherlessai know you found this job on Jobera.
4 other jobs at featherlessai
View all →Explore open roles at featherlessai.
Similar Machine Learning Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.