Saigroup
Saigroup2d ago

Data Manager — Multimodal Medical Foundation Models

IndiaBangaloremid
Data ScienceHealthcareData Manager
0 views0 saves0 applied

Quick Summary

Overview

About the Role You will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation models and agentic clinical AI systems .

Technical Tools
Data ScienceHealthcareData Manager

About the Role

~1 min read

You will lead data operations for a cutting-edge research group developing 3D medical multimodal foundation modelsand agentic clinical AI systems. These models rely on extremely high-quality, well-structured, and compliant datasets—including 3D medical imaging volumes (MRI, CT, PET)clinical text corporaannotations, and multimodal metadata.

Your job is to own the end-to-end data lifecycle: acquisition, ingestion, cleaning, versioning, labeling, quality control, governance, and delivery to researchers. You are the central node ensuring our foundation model teams and medical agent teams have clean, scalable, well-documented data pipelines.

This is a pivotal foundational role—without great data, large models cannot be great.

 

Responsibilities

~1 min read
  • Oversee ingestion and processing of 3D medical volumes (DICOM, NIfTI, MHA) and associated clinical texts.
  • Build automated pipelines for metadata extractionde-identificationslice/series validation, and cohort structuring.
  • Manage large-scale internal datasets and external research datasets (BraTS, LiTS, MIMIC-CXR, CheXpert, MosMed, etc.).
  • Implement scalable data storage, cataloging, and retrieval systems for multimodal training data.
  • Own dataset version control, lineage tracking, reproducibility, and dataset documentation.
  • Collaborate with ML systems engineers on high-throughput data loaders, sharding strategies, and caching mechanisms.
  • Lead medical annotation workflows with radiologists, medical students, and labeling vendors.
  • Create guidelines for ROI labelingsegmentationcaptioningreport alignment, and case-level curation.
  • Build semi-automated labeling pipelines using model-assisted tools.
  • Enforce strict standards on data qualitycompletenessconsistency, and bias control.
  • Ensure adherence to medical data privacyHIPAA-equivalent frameworks, and institutional data-sharing rules.
  • Manage PHI de-identification, audit logs, access control, and compliance approvals.
  • Work closely with foundation-model researchers to understand data needs for model training.
  • Partner with agentic system designers to supply structured datasets for clinical reasoning tasks.
  • Collaborate with foundational engineers on data access layers, performance bottlenecks, and dataset optimization.

 

  • The foundation model relies on high-quality 3D and textual data at scale.
  • You shape the data pipelines enabling next-generation medical AI agents.
  • You ensure clinical-grade governance, safety, reproducibility, and trust.
  • Your systems become the backbone for research, experiments, and deployments.

For candidates motivated by the intersection of data, healthcare, and machine learning, this is a high-impact opportunity.

 

  • Strong experience managing large multimodal or imaging datasets, ideally medical imaging.
  • Proficiency with DICOM/DICOMweb, NIfTI, PACS systems, and medical imaging toolkits (dicompyler, pydicom, MONAI, ITK).
  • Experience with ETL pipelines, distributed data systems, and cloud/on-prem storage.
  • Knowledge of metadata standards, ontologies, and text–image linking strategies.
  • Comfortable working with Python, SQL, and data tooling (Airflow, Prefect, Dagster, DBT, Delta Lake, etc.).
  • Understanding of data privacy, de-identification, and compliance requirements in healthcare.
  • Strong communication skills and the ability to coordinate between engineers, researchers, clinicians, and data partners.

 

Nice to Have

~1 min read
  • Experience with vector databases, multimodal retrieval, or embedding store design.
  • Familiarity with annotation tools (Labelbox, CVAT, iMerit, custom MONAI Label pipelines).
  • Prior work with clinical NLP datasets or multilingual Indian medical corpora.
  • Experience conducting bias audits, dataset characterization, or quality scoring at scale.
  • Contributions to open datasets, benchmarks, or data documentation frameworks.

 

What We Offer

~1 min read
Competitive compensation.
Access to one of the most ambitious medical multimodal datasets in the region.
Collaboration with scientists building India’s first 3D multimodal medical foundation model.
Autonomy to design data systems from the ground up.
A mission-driven team working to transform clinical care with agentic AI.

Listing Details

Posted
April 15, 2026
First seen
March 26, 2026
Last seen
April 17, 2026

Posting Health

Days active
22
Repost count
0
Trust Level
74%
Scored at
April 17, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trustcandidate experience
Saigroup
Saigroup
greenhouse
Employees
30
Founded
2017
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

SaigroupData Manager — Multimodal Medical Foundation Models