Senior Applied Data Scientist (Retrieval and Semantic Systems)

Hungary·Budapestfull-timesenior

OtherApplied Data Scientist

2 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

chunking, embedding generation, backfills, deletes, and versioned indices Implement semantic and hybrid retrieval: embeddings, similarity search, lexical and vector combination, metadata filtering,

Technical Tools

OtherApplied Data Scientist

At Instructure, we believe in the power of people to grow and succeed throughout their lives. Our goal is to amplify that power by creating intuitive products that simplify learning and personal development, facilitate meaningful relationships, and inspire people to go further in their education and careers.
We do this by giving smart, creative, passionate people opportunities to create awesome. And that's where you come in:

Our team builds AI-native capabilities, reusable AI systems, and shared infrastructure that power multiple products and workflows across the platform.

We are looking for a Senior Applied Data Scientist to own retrieval and semantic systems end to end, as a core, reusable capability that multiple AI products depend on. You will own the full retrieval vertical: vector store selection and operation, indexing and refresh pipelines, semantic and hybrid retrieval, reranking, and the evaluation systems that prove relevance is good and stays good. You will own retrieval-specific architecture and its day-to-day operation, while our infrastructure owner provides the underlying cloud, cluster, and CI substrate and our AI Platform engineers provide the general MLOps and service scaffolding you build on.

You will work closely with product, engineering, and research partners to turn advanced AI ideas into reliable product capabilities used at scale.

Important note on scope: This is a deep individual-contributor specialist role. We are looking for someone who has owned a retrieval system in production, not someone who has only used a vector database in a prototype. Retrieval evaluation is central to this role: if you cannot measure relevance and catch regressions before they reach users, the system is not done.

What You'll Do

Design, build, and ship production retrieval systems that power AI product capabilities across multiple products
Own vector store selection and operation, including scalability, latency, reliability, cost, and multi-tenant design
Build and operate indexing and refresh pipelines: chunking, embedding generation, backfills, deletes, and versioned indices
Implement semantic and hybrid retrieval: embeddings, similarity search, lexical and vector combination, metadata filtering, and reranking
Own retrieval evaluation as a first-class system: gold sets, offline relevance metrics, slice analysis, drift detection, and regression gates that block bad changes from shipping
Make and defend the core tradeoffs of the domain: relevance against latency against cost against operational complexity
Partner with AI Platform and infrastructure engineers on deployment, observability, and reliability, and with product and research partners on relevance requirements

What You'll Need

6+ years of experience building and shipping production machine learning or applied AI systems
Proven ownership of a retrieval system in production, including vector store selection and operation
Strong Python skills and experience building services and APIs (for example, FastAPI or similar)
Solid grounding in embeddings, approximate nearest neighbor search, and retrieval and ranking systems
Experience designing indexing and refresh strategies, with data quality controls and safe backfills
Demonstrated ability to define and run retrieval evaluation: building gold sets, choosing relevance metrics, analyzing failures by slice, and preventing regressions
Strong tradeoff judgment across relevance, latency, cost, and operational complexity

It Would Be a Bonus If You Had

Experience with hybrid retrieval (lexical and vector), learning to rank, or domain-specific reranking
Experience integrating graph-structured context or knowledge graphs into retrieval
Experience with evaluation and observability for LLM and retrieval systems, including drift, failure analysis, and regression prevention
Experience with AWS-native retrieval and indexing architectures
Experience in edtech, content, curriculum, or skills modeling

Onsite Collaboration Requirement: This role requires working onsite on Tuesday and Wednesday, with Thursday strongly encouraged as part of our company’s in-person collaboration model.

Growth & Impact - In This Role, You'll Be Expected To

In this role, you will own retrieval and semantic search as a core differentiator that many AI products build on. You will set how retrieval is architected, operated, and evaluated at Instructure, and you will be the person accountable for relevance being good, measurable, and durable as the system and its content evolve.

Why Join Us

Join us and help shape the future of education by turning cutting-edge AI into reliable product capabilities.

At Instructure, we're on a mission to help educators and students learn together, anytime, anywhere, and however works best. You'll join our research-driven team tackling education's biggest challenges with cutting-edge technology. Our projects have included making sense of unstructured feedback, applying large language models to save teachers' time and improve student experiences, classifying partner networks for smarter recommendations, and detecting fraud to protect resources for real learners.

We value diversity, creativity, and passion, and invest in our teams through mentorship, hack weeks, internal conferences, and a culture where innovation thrives. Here, you'll have the chance to build the next generation of LMS features that make a real impact on students and teachers, and do it in a collaborative, supportive environment that encourages experimentation and growth.

Get in on all the awesome at Instructure!

We offer competitive, meaningful benefits in every country where we operate. While they vary by location, here's a general idea of what you can expect:

Competitive compensation, plus all full-time employees participate in our ownership program - because everyone should have a stake in our success.
Flexible work culture. Our remote, hybrid and in-office collaboration spaces vary by role, team and location.
Generous time off, including local holidays and our annual “Dim the Lights” period in late December, when teams are encouraged to step back and recharge based on departmental needs.
Comprehensive wellness programs and mental health support
Learning and development resources, including professional development tools and tuition reimbursement, to support your growth
The technology and tools you need to do your best work
Motivosity employee recognition program
A culture rooted in inclusivity, support, and meaningful connection

We believe in hiring great people and treating them right. The more diverse we are, the better our ideas and outcomes.

Instructure is an Equal Opportunity Employer. We comply with applicable employment and anti-discrimination laws in every country where we operate.

All employees must pass a background check as part of the hiring process. To help protect our teams and systems, we’ve implemented identity verification measures. Candidates may be asked to verify their legal name, current physical location, and provide a valid contact number and residential address, in accordance with local data privacy laws.

Any attempt to misrepresent personal or professional information will result in disqualification.