Data Manager — Multimodal Medical Foundation Models
Quick Summary
About the Role You will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation models and agentic clinical AI systems .
About the Role
~1 min readYou will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation modelsand agentic clinical AI systems. These models rely on extremely high-quality, well-structured, and compliant datasets—including 3D medical imaging volumes (MRI, CT, PET), clinical text corpora, annotations, and multimodal metadata.
Your job is to own the end-to-end data lifecycle: acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.
This is a pivotal foundational role—without great data, large models cannot be great.
Responsibilities
~1 min read- Oversee ingestion and processing of 3D medical volumes (DICOM, NIfTI, MHA) and associated clinical texts.
- Build automated pipelines for metadata extraction, de-identification, slice/series validation, and cohort structuring.
- Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).
- Implement scalable data storage, cataloging, and retrieval systems for multimodal training data.
- Own dataset version control, lineage tracking, reproducibility, and dataset documentation.
- Collaborate with ML systems engineers on high-throughput data loaders, sharding strategies, and caching mechanisms.
- Lead medical annotation workflows with radiologists, medical students, and labeling vendors.
- Create guidelines for ROI labeling, segmentation, captioning, report alignment, and case-level curation.
- Build semi-automated labeling pipelines using model-assisted tools.
- Enforce strict standards on data quality, completeness, consistency, and bias control.
- Ensure adherence to medical data privacy, HIPAA-equivalent frameworks, and institutional data-sharing rules.
- Manage PHI de-identification, audit logs, access control, and compliance approvals.
- Work closely with foundation-model researchers to understand data needs for model training.
- Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.
- Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.
- The foundation model relies on high-quality 3D and textual data at scale.
- You shape the data pipelines enabling next-generation medical AI agents.
- You ensure clinical-grade governance, safety, reproducibility, and trust.
- Your systems become the backbone for research, experiments, and deployments.
For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.
- Strong experience managing large multimodal or imaging datasets, ideally medical imaging.
- Proficiency with DICOM/DICOMweb, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).
- Experience with ETL pipelines, distributed data systems, and cloud/on-prem storage.
- Knowledge of metadata standards, ontologies, and text–image linking strategies.
- Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).
- Understanding of data privacy, de-identification, and compliance requirements in healthcare.
- Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.
Nice to Have
~1 min read- Experience with vector databases, multimodal retrieval, or embedding store design.
- Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).
- Prior work with clinical NLP datasets or multilingual Indian medical corpora.
- Experience conducting bias audits, dataset characterization, or quality scoring at scale.
- Contributions to open datasets, benchmarks, or data documentation frameworks.
What We Offer
~1 min readListing Details
- Posted
- April 15, 2026
- First seen
- March 26, 2026
- Last seen
- April 17, 2026
Posting Health
- Days active
- 22
- Repost count
- 0
- Trust Level
- 74%
- Scored at
- April 17, 2026
Signal breakdown
Please let Saigroup know you found this job on Jobera.
3 other jobs at Saigroup
View all →Explore open roles at Saigroup.
Similar Data Manager — Multimodal Medical Foundation Models jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.
