We're looking for a Data Scientist with strong fundamentals in machine learning and a deep interest in voice technology. This role focuses on building and fine-tuning voice-related models from scratch — including speech-to-text, speaker diarization, audio classification, and LLM-integrated speech systems. You’ll work across the full stack of data science: from data collection and curation to model development, evaluation, and production deployment. Lead the development of custom models for speech recognition, transcription, audio segmentation, and speaker identification. Build robust data pipelines: collecting, preprocessing, cleaning, and labeling large audio datasets. Fine-tune and evaluate state-of-the-art open-source models (e.g., Whisper, wav2vec, HuBERT, Conformer) on proprietary datasets. Design experiments and benchmark models for quality, latency, and domain adaptability. Work closely with product teams to embed voice capabilities into real-time applications (e.g., live summarization, AI agents, call insights). Maintain scalable training, evaluation, and inference workflows using modern ML tooling (e.g., PyTorch, Hugging Face, Weights & Biases). Contribute to internal knowledge sharing and best practices around audio ML. Requirements Requirements Strong experience with speech or audio ML: speech recognition, speaker diarization, voice activity detection, etc. Hands-on experience in building models from scratch and fine-tuning large models. Deep understanding of signal processing, feature extraction, and data augmentation for audio. Proficient in Python and common ML libraries: PyTorch, NumPy, Scikit-learn, Hugging Face. Familiarity with end-to-end ML pipelines: data cleaning, training, tuning, evaluation, and serving. Comfort with using cloud platforms (GCP, AWS) and containerized environments. High agency and comfort working in fast-paced, ambiguous environments. Nice to Have Experience with LLM + speech integration (e.g., Whisper + GPT pipelines). Knowledge of real-time systems or streaming inference. Understanding of multilingual ASR challenges and dialect modeling (Arabic dialects a plus). Experience with tools like DVC, MLflow, or W&B for experiment tracking. Benefits The chance to build domain-defining voice technology from scratch. Exposure to real-world deployments and rapid iteration cycles. Mentorship and collaboration with a team of high-agency engineers and researchers. Flexible, remote-friendly work culture centered around ownership and outcomes.

Datascientist

Quick Summary

Location & Eligibility

Listing Details

Posting Health

Similar Other jobs

Stay ahead of the market