Quick Summary
Join our India Tech Hub – Be among the first hires! Kobie, a 35-year veteran of the loyalty industry, a multi-year Forrester Leader,
About the team and what we will build together
We’re looking for an Lead AI QA Engineer with 6+ years of experience who thrives on designing test strategies and evaluation harnesses for production-grade, agentic AI systems in addition to experience in ETL. You have strong Python skills, hands-on experience testing LLM-powered features (prompt regression, tool/function-call validation, RAG correctness, and structured-output schema checks), and working knowledge of evaluation frameworks such as RAGAS, DeepEval, LangSmith or Langfuse. You are comfortable writing solid SQL, automating tests with PyTest, exercising APIs through Postman or REST clients, and shipping test pipelines using Git, Docker and CI tooling like Jenkins or GitHub Actions.
Kobie runs some of the largest loyalty programs in the world. We are building an internal agent platform on Dataiku that automates analyst workflows, surfaces insights from program data in Snowflake, and gives our teams an LLM-native way to work with complex loyalty logic. As an Lead AI QA Engineer on the India Tech Hub team, you will play a key role in protecting that platform — designing golden datasets, running LLM-as-judge and regression suites, and owning the quality bar for what goes to production. This is not a manual-only role: you will automate, build qa & automation strategies, roadmaps, instrument, monitor and partner closely with our U.S. AI & Innovation team and cross-functional partners across Engineering, Data, AI and Product.
-
Design and build evaluation harnesses for agentic systems in Python — golden datasets, LLM-as-judge graders, multi-turn regression suites and trace-based assertions. In addition, develop framework to verify generated AI output.
-
Author automated test suites for prompts, tools, structured outputs (Pydantic / JSON schema), retrieval pipelines (ETL Experience) and end-to-end agent workflows
-
Validate guardrails around tool execution: auth scoping, input/output validation, PII and prompt-injection protections, and hallucination mitigation
-
Wire evaluations into CI using Dataiku Evaluations, GitHub Actions or Jenkins so every change is graded against quality, safety and cost SLOs before it ships
-
Build observability into testing by instrumenting traces with LangSmith, Langfuse, MLflow or OpenTelemetry and triaging production drift back into the eval harness
-
Own quality end-to-end — define release criteria, run pre-prod and shadow tests, and partner with engineering to root-cause and fix regressions quickly
-
Partner with data engineers on Snowflake-backed retrieval testing patterns (Cortex Analyst and Cortex Search Services) and with platform teams on observability, security and cost
-
Help shape internal QA standards for AI & Data engineering as the stack evolves, contributing to design reviews and sharing knowledge across the India and U.S. teams
-
Participate in a collaborative DevOps environment, working closely with developers, AI engineers, Data Engineers, DBAs and product partners across environments
In your first 90 days
By the end of your first 90 days, you will have stood up at least one production-grade evaluation harness — golden dataset, LLM-as-judge graders and regression suite — wired into CI for an internal agent. You will have automated trace-based assertions running against staging traffic, a clear quality scorecard for at least one shipped agent, and a clear opinion about what our next testing investment should be.
-
3+ years of professional QA / SDET experience, with production experience automating tests for backend services or data pipelines
-
1+ years of hands-on experience testing LLM or AI features in production: prompt regression, tool / function-call validation, structured outputs and RAG correctness
-
Working knowledge of evaluation frameworks such as RAGAS, DeepEval, LangSmith, Langfuse or comparable LLM-as-judge tooling
-
Strong Python and PyTest skills; solid SQL skills and comfort with at least one cloud platform (AWS, Azure or GCP)
-
Fluency with Git, Docker, REST APIs and at least one CI tool (GitHub Actions, Jenkins, GitLab CI or CircleCI)
-
Solid understanding of data security and responsible AI practices, particularly in PCI-compliant or regulated environments
-
Proven ability to work independently and within a team, managing priorities across concurrent projects and time zones
-
Strong written and verbal communication skills; able to work effectively with both technical and non-technical stakeholders
-
A bachelor’s degree is not required — equivalent practical experience (including bootcamps, self-taught work, career changes or non-CS technical degrees) counts
-
Bonus Skills:
-
Hands-on experience with Dataiku DSS (Python / SQL recipes, scenarios, code environments, the dataiku and dataikuapi clients) or Dataiku Evaluations
-
Experience with Dataiku LLM Mesh, Knowledge Banks, Prompt Studio, or Visual / Code Agents
-
Experience with Snowflake, Snowpark, or Snowflake Cortex (Search, Analyst, Agents)
-
Experience with red-teaming, prompt-injection testing or adversarial test generation for LLMs
-
Familiarity with multi-agent patterns: supervisor / router, subagent / handoff, reflection, human-in-the-loop
-
Experience with performance and load testing tools such as Locust, JMeter or k6
-
ISTQB, AI Testing or comparable QA certification
-
Experience in loyalty, martech, adtech or a comparable data-rich B2B domain
Location & Eligibility
Listing Details
- Posted
- June 1, 2026
- First seen
- June 1, 2026
- Last seen
- June 1, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 70%
- Scored at
- June 1, 2026
Signal breakdown
Please let Kobie know you found this job on Jobera.
3 other jobs at Kobie
View all →Explore open roles at Kobie.
Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.
