codvo-team~2d ago
New
New
Head of AI Evaluation & Reliability Engineering
OtherHead
1 views0 saves0 applied
Quick Summary
Overview
Head of AI Evaluation & Reliability Engineering Location: Flexible / HybridReports To: Head of Engineering Role MissionBuild and scale Codvo’s AI Evaluation & Reliability Engineering capability as a core engineering function supporting the design, validation, and continuous improvement of…
Technical Tools
ci-cd
Head of AI Evaluation & Reliability Engineering
Location: Flexible / Hybrid
Reports To: Head of Engineering
Reports To: Head of Engineering
Role Mission
Build and scale Codvo’s AI Evaluation & Reliability Engineering capability as a core engineering function supporting the design, validation, and continuous improvement of enterprise AI systems in production.
Build and scale Codvo’s AI Evaluation & Reliability Engineering capability as a core engineering function supporting the design, validation, and continuous improvement of enterprise AI systems in production.
You will architect the frameworks, tooling, benchmark assets, and operational processes required to ensure AI systems deployed by Codvo and its customers meet enterprise standards for reliability, safety, performance, and governance.
This role is deeply embedded within engineering and serves as the quality and reliability backbone for Codvo’s AI platform and delivery organization.
Why This Role Matters
As AI systems move from pilots to business-critical workflows, reliability and evaluation become core engineering disciplines—not optional afterthoughts.
As AI systems move from pilots to business-critical workflows, reliability and evaluation become core engineering disciplines—not optional afterthoughts.
Codvo is building the infrastructure and operational rigor required to ensure every AI deployment is measurable, governed, and production-ready.
Core Responsibilities
Engineering Ownership
- Build Codvo’s AI Evaluation & Reliability Engineering function as a core platform/engineering capability.
- Define engineering standards for AI evaluation, testing, release gating, and runtime monitoring.
- Integrate evaluation/reliability frameworks into Codvo’s engineering and delivery lifecycle.
- Build Codvo’s AI Evaluation & Reliability Engineering function as a core platform/engineering capability.
- Define engineering standards for AI evaluation, testing, release gating, and runtime monitoring.
- Integrate evaluation/reliability frameworks into Codvo’s engineering and delivery lifecycle.
Evaluation Architecture
- Design reusable evaluation frameworks for:
- LLM / multimodal quality
- RAG grounding / evidence fidelity
- Agent reasoning / decision quality
- Tool / workflow execution success
- Safety / policy / compliance adherence
- Cost / latency / production economics
- Design reusable evaluation frameworks for:
- LLM / multimodal quality
- RAG grounding / evidence fidelity
- Agent reasoning / decision quality
- Tool / workflow execution success
- Safety / policy / compliance adherence
- Cost / latency / production economics
Benchmark Infrastructure
- Build benchmark packs, golden datasets, and regression suites for priority enterprise workflows.
- Define benchmark coverage and versioning standards.
- Establish processes for edge-case capture and benchmark expansion.
- Build benchmark packs, golden datasets, and regression suites for priority enterprise workflows.
- Define benchmark coverage and versioning standards.
- Establish processes for edge-case capture and benchmark expansion.
Runtime Reliability Systems
- Design systems/processes for:
- Runtime drift / degradation monitoring
- Failure mode analysis / incident diagnostics
- Human review / escalation pathways
- Continuous evaluation and improvement loops
- Design systems/processes for:
- Runtime drift / degradation monitoring
- Failure mode analysis / incident diagnostics
- Human review / escalation pathways
- Continuous evaluation and improvement loops
Technical Leadership
- Partner closely with platform, product, and solution engineering teams.
- Serve as internal SME on AI reliability, benchmark design, and evaluation methodology.
- Help shape architecture standards for AI-native product and workflow delivery.
- Partner closely with platform, product, and solution engineering teams.
- Serve as internal SME on AI reliability, benchmark design, and evaluation methodology.
- Help shape architecture standards for AI-native product and workflow delivery.
Team Leadership
- Build and lead a team of:
- Evaluation Engineers
- Benchmark / QA Engineers
- Reliability / Observability Engineers
- Domain Review / Feedback Ops Specialists
- Build and lead a team of:
- Evaluation Engineers
- Benchmark / QA Engineers
- Reliability / Observability Engineers
- Domain Review / Feedback Ops Specialists
Required Qualifications
- 10+ years in engineering / AI / ML leadership roles.
- 5+ years building or operating production AI / ML systems.
- Proven experience designing or operating:
- AI/LLM evaluation frameworks
- Benchmark / regression systems
- AI QA / testing / validation infrastructure
- Production ML / observability / monitoring systems
- Reliability engineering / quality engineering organizations
- 10+ years in engineering / AI / ML leadership roles.
- 5+ years building or operating production AI / ML systems.
- Proven experience designing or operating:
- AI/LLM evaluation frameworks
- Benchmark / regression systems
- AI QA / testing / validation infrastructure
- Production ML / observability / monitoring systems
- Reliability engineering / quality engineering organizations
Technical Expertise
- LLM / multimodal evaluation methodologies
- Benchmark / golden dataset design
- Agent / tool-use / workflow evaluation
- RAG evaluation / grounding analysis
- AI observability / telemetry / tracing
- Human-in-the-loop feedback systems
- AI safety / governance / policy testing
- Release gating / CI/CD / engineering quality systems
- LLM / multimodal evaluation methodologies
- Benchmark / golden dataset design
- Agent / tool-use / workflow evaluation
- RAG evaluation / grounding analysis
- AI observability / telemetry / tracing
- Human-in-the-loop feedback systems
- AI safety / governance / policy testing
- Release gating / CI/CD / engineering quality systems
Preferred Backgrounds
- AI Infrastructure / Evaluation Platforms
- AI Observability / MLOps Companies
- Enterprise AI Platform Teams
- Applied AI Product / Platform Organizations
- Reliability / QA Engineering Leadership in Complex Systems
- AI Infrastructure / Evaluation Platforms
- AI Observability / MLOps Companies
- Enterprise AI Platform Teams
- Applied AI Product / Platform Organizations
- Reliability / QA Engineering Leadership in Complex Systems
Success Metrics
- Establish Codvo-wide AI evaluation/reliability standards
- Integrate evaluation frameworks into engineering lifecycle
- Launch reusable benchmark packs for target workflows
- Reduce AI production failure / exception rates across deployments
- Improve release confidence and deployment velocity for AI systems
- Increase benchmark/evaluation asset reuse across customers
- Establish Codvo-wide AI evaluation/reliability standards
- Integrate evaluation frameworks into engineering lifecycle
- Launch reusable benchmark packs for target workflows
- Reduce AI production failure / exception rates across deployments
- Improve release confidence and deployment velocity for AI systems
- Increase benchmark/evaluation asset reuse across customers
Ideal Candidate Profile
- Systems/reliability engineer mindset with strong AI depth
- Product-minded builder who can create reusable engineering frameworks
- Obsessed with operational excellence and measurable quality
- Comfortable driving standards across engineering organizations
- Systems/reliability engineer mindset with strong AI depth
- Product-minded builder who can create reusable engineering frameworks
- Obsessed with operational excellence and measurable quality
- Comfortable driving standards across engineering organizations
Note- Please apply via our official careers portal only, as applications sent directly to executives may not be considered.
Location & Eligibility
Where is the job
Pune, India
On-site at the office
Who can apply
IN
Listing Details
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 42%
- Scored at
- May 6, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trust
External application · ~5 min on codvo-team's site
Please let codvo-team know you found this job on Jobera.
4 other jobs at codvo-team
View all →Explore open roles at codvo-team.
Browse Similar Jobs
Manager6.6kAssistant Manager5.8kTeam Member5.4kEngineer4.1kDirector3.4kAssistant3.1kAssociate3kConsultant2.8kTechnician2.7kSupervisor2.2kCoordinator2.2kData Collector2.2kFitness & Wellness2.1kTeam Leader1.8kAnalyst1.8kRestaurant General Manager1.8kPart Time1.6kCrew Member1.6kBehavioral Health1.4kFull1.3k
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.