clusterlab
clusterlab12mo ago
New

Datascientist

TunisiaTunisia·Tunismid
Other
0 views0 saves0 applied

Quick Summary

Overview

We're looking for a Data Scientist with strong fundamentals in machine learning and a deep interest in voice technology. This role focuses on building and fine-tuning voice-related models from scratch — including speech-to-text, speaker diarization, audio classification, and LLM-integrated speech…

Technical Tools
awsgcphuggingfacenumpypythonpytorchscikit-learnetlmachine-learningmentoring
We're looking for a Data Scientist with strong fundamentals in machine learning and a deep interest in voice technology. This role focuses on building and fine-tuning voice-related models from scratch — including speech-to-text, speaker diarization, audio classification, and LLM-integrated speech systems. You’ll work across the full stack of data science: from data collection and curation to model development, evaluation, and production deployment. Lead the development of custom models for speech recognition, transcription, audio segmentation, and speaker identification. Build robust data pipelines: collecting, preprocessing, cleaning, and labeling large audio datasets. Fine-tune and evaluate state-of-the-art open-source models (e.g., Whisper, wav2vec, HuBERT, Conformer) on proprietary datasets. Design experiments and benchmark models for quality, latency, and domain adaptability. Work closely with product teams to embed voice capabilities into real-time applications (e.g., live summarization, AI agents, call insights). Maintain scalable training, evaluation, and inference workflows using modern ML tooling (e.g., PyTorch, Hugging Face, Weights & Biases). Contribute to internal knowledge sharing and best practices around audio ML. Requirements Requirements Strong experience with speech or audio ML: speech recognition, speaker diarization, voice activity detection, etc. Hands-on experience in building models from scratch and fine-tuning large models. Deep understanding of signal processing, feature extraction, and data augmentation for audio. Proficient in Python and common ML libraries: PyTorch, NumPy, Scikit-learn, Hugging Face. Familiarity with end-to-end ML pipelines: data cleaning, training, tuning, evaluation, and serving. Comfort with using cloud platforms (GCP, AWS) and containerized environments. High agency and comfort working in fast-paced, ambiguous environments. Nice to Have Experience with LLM + speech integration (e.g., Whisper + GPT pipelines). Knowledge of real-time systems or streaming inference. Understanding of multilingual ASR challenges and dialect modeling (Arabic dialects a plus). Experience with tools like DVC, MLflow, or W&B for experiment tracking. Benefits The chance to build domain-defining voice technology from scratch. Exposure to real-world deployments and rapid iteration cycles. Mentorship and collaboration with a team of high-agency engineers and researchers. Flexible, remote-friendly work culture centered around ownership and outcomes.

Location & Eligibility

Where is the job
Tunis, Tunisia
On-site at the office

Listing Details

Posted
April 30, 2025
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
14%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

clusterlabDatascientist