Research Scientist/Engineer (Evaluations)

London · (london)Full-timemid
Data ScienceResearch ScientistData & AI
0 views0 saves0 applied

Quick Summary

Requirements Summary

Our entire stack uses Python. We're looking for candidates with strong software engineering experience. Ideally, you have experience shipping and maintaining production Python code,

Technical Tools
Data ScienceResearch ScientistData & AI
Application deadline: We are conducting interviews actively and aim to fill this role as soon as we find someone suitable. 
 
ABOUT THE OPPORTUNITY
 
We develop and run evaluations that help assess the risks posed by scheming AIs. You will get to work with frontier labs like OpenAI, Anthropic, and Google DeepMind and be amongst the first to interact with new models before anyone else. The ideal candidate loves rigorously testing frontier AI models, and enjoys building efficient pipelines and automating them. 
 
YOU WILL HAVE THE OPPORTUNITY TO
 
- Run pre-deployment evaluation campaigns on the most capable AI systems in the world. We partner with multiple labs, giving you access to a breadth of models that no single AI lab could offer. You'll be among the first people to interact with new models before anyone else.
- Deep dive into AI cognition. Scan through thousands of model transcripts to surface behavioral patterns that no one has ever observed before. These patterns are often deeply surprising and fascinating to study, e.g. the non-standard language and the reward-seeking reasoning described in our anti-scheming paper.
- Build new evaluations for frontier risks, from designing novel test environments to scaling them across hundreds of distinct scenarios.
- Work directly with frontier AI developers. Share your findings, engage with their feedback, and see your evaluations directly inform deployment decisions for the most capable AI systems in the world.
- Automate and improve the evaluation pipeline. We already use automation across building, running, and analyzing evals. Rapid progress in agent capabilities opens up radically new possibilities, and you'll have the freedom to rethink and reshape the pipeline as they emerge.
 
KEY REQUIREMENTS
 
- Software engineering skills: Our entire stack uses Python. We're looking for candidates with strong software engineering experience. Ideally, you have experience shipping and maintaining production Python code, and know how to factor messy problems into clean abstractions that others can use and extend.
- Process optimisation: You always try to improve workflows. Pre-deployment evaluations are very fast paced so ideally you love shaving friction off your workflows wherever possible.
- Data Analysis & Pattern Recognition: You can extract signal from large, messy datasets. You're comfortable with quantitative analysis and know when qualitative assessment is more appropriate. You can identify anomalies and unexpected model behaviors.
- Writing and communication: You succinctly convey qualitative and quantitative findings to a technical and non-technical audience.
- AI power-user: You are curious about the capabilities and propensities of frontier AI models. You have experience using different models, know which ones to use for which tasks, when not to use AI, and you always experiment with new AI workflows
 
(Bonus) We are using Inspect as our primary evals framework, and we value experience with it.
 
We want to emphasize that people who feel they don’t fulfill all of these characteristics but think they would be a good fit for the position, nonetheless, are strongly encouraged to apply. We believe that excellent candidates can come from a variety of backgrounds and are excited to give you opportunities to shine. We don’t require a formal background or industry experience and welcome self-taught candidates.
  • This role offers market competitive salary, equity, and competitive benefits.
  • Salary: 100k - 200k GBP (~135k - 270k USD)
  • Flexible work hours and schedule
  • Unlimited vacation
  • Unlimited sick leave
  • Up to 6 months of paid parental leave
  • Comprehensive health, dental and vision insurance
  • Retirement savings with competitive employer matching (e.g. 401(k) for US employees)
  • Lunch, dinner, and snacks are provided for all employees on workdays
  • Paid work trips, including staff retreats, business trips, and relevant conferences
  • A yearly $1,000 (USD) professional development budget
  • Time Allocation: Full-time
  • Location: This is an in-person role working out of our London or San Francisco office.
  • Visa sponsorship: We sponsor visas in both the UK and US. Sponsorship isn't guaranteed for every role or candidate, but if we make you an offer, we'll work with you to find the right visa route.
  • Listing Details

    Posted
    February 13, 2026
    First seen
    March 26, 2026
    Last seen
    April 22, 2026

    Posting Health

    Days active
    27
    Repost count
    0
    Trust Level
    23%
    Scored at
    April 22, 2026

    Signal breakdown

    freshnesssource trustcontent trustemployer trust
    Newsletter

    Stay ahead of the market

    Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

    A
    B
    C
    D
    Join 12,000+ marketers

    No spam. Unsubscribe at any time.

    A
    Research Scientist/Engineer (Evaluations)