Research Scientist - Frontier Data
Quick Summary
About AfterQuery AfterQuery builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks.
You'll design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at top AI labs, you'll experiment with data collection strategies, diagnose model failure modes, and…
Great candidates are undergrad research or master's research (but haven't done a phd) Major plus if they've worked for/interned for any RL environment companies in the past or any AI safety or benchmarking orgs like METR, Artificial Analysis, etc..
AfterQuery builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks. We are a small, early team (post Series A) where individual contributors have a direct impact on how the next generation of models learn and improve.
You'll design the datasets and evaluation frameworks that shape how frontier models are trained and measured. Working directly with research teams at top AI labs, you'll experiment with data collection strategies, diagnose model failure modes, and develop the metrics that determine whether a model is actually getting better. This is hands-on, high leverage work: you'll go from hypothesis to live experiment quickly, and your output will directly influence model training runs at scale.
Responsibilities
~1 min read- →
Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows
- →
Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
- →
Model annotator behavior and run experiments to improve different model capabilities
- →
Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability
- →
Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications
Great candidates are undergrad research or master's research (but haven't done a phd)
Major plus if they've worked for/interned for any RL environment companies in the past or any AI safety or benchmarking orgs like METR, Artificial Analysis, etc..
Genuine obsession with how data structure, selection, and quality drive model behavior
Ability to design lightweight experiments, move fast, and extract actionable insights from messy results
Comfort working across domains (you'll touch finance, software engineering, policy, and more)
Strong quantitative instincts and familiarity with LLM training pipelines, RLHF/RLVR, or evaluation methodology
A bias toward building over theorizing
What We Offer
~1 min read$250k-450k total compensation + equity
Location & Eligibility
Listing Details
- Posted
- April 14, 2026
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 49%
- Scored at
- May 6, 2026
Signal breakdown
Please let afterquery know you found this job on Jobera.
4 other jobs at afterquery
View all →Explore open roles at afterquery.
Similar Data Scientist jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.