Data Scientist II

United States·Rockvillemid

Data ScientistData

0 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

Design, build, and maintain reproducible pipelines for diverse biomedical data types — including genomic, transcriptomic, single-cell, spatial, proteomic, metagenomic, metabolomic,

Requirements Summary

Bachelor's degree in Data Science, Bioinformatics, Computer Science, Biological Sciences, or a related field (advanced degree preferred), or equivalent experience.

Technical Tools

Data ScientistData

(ID: 2026-2574)

What We Offer

~6 min read

✓100% Medical, Dental & Vision Coverage for Employees

✓Paid Time Off and Paid Holidays

✓401K match up to 5%

✓Educational Benefits for Career Growth

✓Employee Referral Bonus

✓Flexible Spending Accounts: Healthcare (FSA)

✓Parking Reimbursement Account (PRK)

✓Dependent Care Assistant Program (DCAP)

✓Transportation Reimbursement Account (TRN)

✓Bioinformatics Workflow and Data Pipeline Development: Design, build, and maintain reproducible pipelines for diverse biomedical data types — including genomic, transcriptomic, single-cell, spatial, proteomic, metagenomic, metabolomic, and clinical datasets. Develop reusable transformation logic and curated datasets supporting analytics, dashboards, APIs, notebooks, and downstream research workflows.

✓Multi-Omics Analysis: Support NCI CBIIT labs in their analysis workflows including bulk RNA-seq (QC, DEG, GSEA), single-cell RNA-seq (clustering, UMAP/t-SNE, cell type annotation, DEG), and Digital Spatial Profiling (annotation, QC, normalization, spatial deconvolution, volcano plots, heatmaps).

✓Data Integration and Lifecycle Support: Enable reliable data movement from source systems into structured, analysis-ready formats. Support ingestion, curation, metadata capture, source-to-target mapping, schema management, provenance tracking, and long-term maintainability of data products.

✓Statistical Modeling and Machine Learning: Apply statistical and ML methods — including hypothesis testing, regression, clustering, PCA, UMAP, t-SNE, and classification — to biomedical datasets. Incorporate AI/LLM-based extraction where appropriate, with clear validation and communication to stakeholders.

✓Researcher-Facing Applications and Visualization: Build and support interactive dashboards (Shiny, Streamlit), notebooks, reports, and APIs enabling researchers to explore multi-omics and clinical data. Support figure generation for QC, differential expression, pathway, and spatial analyses.

✓Collaboration: Partner with data scientists, bioinformaticians, researchers, developers, and government stakeholders to translate scientific needs into technical specifications, data models, and reusable workflows that accelerate biomedical research.

✓Education & Background: Bachelor's degree in Data Science, Bioinformatics, Computer Science, Biological Sciences, or a related field (advanced degree preferred), or equivalent experience. Demonstrated experience in a data-intensive role supporting biomedical research or scientific computing.

✓Data Science and Bioinformatics Expertise: Strong proficiency in Python and R for analysis, scripting, and visualization. Hands-on experience with at least two omics data types (e.g., bulk RNA-seq, scRNA-seq, spatial transcriptomics, proteomics, metagenomics, GWAS).

✓Analytical Skills: Solid understanding of statistical modeling, dimensionality reduction, clustering, differential expression, and pathway analysis. Ability to work with structured, semi-structured, and unstructured data across relational and data lake environments.

✓Collaboration & Communication: Strong problem-solving skills with the ability to communicate effectively across technical and non-technical audiences. Able to translate scientific needs into technical solutions and clearly articulate risks, assumptions, and limitations.

✓Domain Alignment: Genuine interest in biomedical and translational research. Ability to quickly learn domain-specific terminology and workflows, with awareness of data governance, privacy, and compliance requirements for clinical and research data.

✓Data Platform Experience: Experience building analytics solutions in platforms such as Snowflake, Databricks, or cloud data warehouses, with integrations across databases, APIs, dashboards, and application environments.

✓Bioinformatics Workflow Tooling: Experience with workflow and reproducibility tools used in Galaxy, Terra, Nextflow/WDL, Snakemake, Singularity, or CWL. Familiarity with the scverse Python ecosystem (Scanpy, Squidpy, SCIMAP, AnnData) and spatial single-cell analysis methods, including PhenoGraph, Louvain/Leiden clustering, UMAP, and Ripley's L statistic, is a plus.

✓Research and Application Enablement: Experience preparing curated datasets for dashboards, APIs, and web applications. Familiarity with Posit Connect, R/Shiny, Streamlit, Jupyter, or similar platforms is a plus.

✓Cloud, HPC, Storage, and Automation: Experience with AWS (EC2, S3, Lambda), object storage, relational databases, scheduled jobs, API integrations, and secure data movement. Familiarity with HPC environments, SLURM/SGE, or NIH Biowulf is preferred.

✓Biomedical Domain Knowledge: Background in biomedical research, clinical research, or healthcare analytics. Familiarity with standards such as HL7/FHIR, CDISC, or OMOP, and experience with clinical, genomic, or biospecimen data is a plus.

✓Governance and Reproducibility: Experience with metadata management, data lineage, open-source code release, containerized analyses, and secure handling of de-identified or access-controlled research datasets.

✓Training and Scientific Enablement: Experience creating documentation, training materials, or workshops for researchers and non-coder audiences. Ability to support tool adoption and explain workflows and results clearly is strongly preferred.