Designation: Applied AI Engineer, GenAI and ML Prototyping
Experience: 4-6 years
Location: REMOTE
Role Overview
We are looking for an Engineer/Data Scientist to lead the identification and rapid prototyping of AI solutions across our business — spanning both internal operations and customer-facing products.
This role sits at the earliest and most critical stage of our AI delivery lifecycle: Discovery and Proof of Concept. You will partner with Senior and Principal engineers and work directly with department heads and product owners to uncover where AI can create meaningful impact, then design and build working prototypes that demonstrate clear, measurable value. You will own the process from problem framing through to a validated, decision-ready POC — determining whether the right solution is a rule-based system, a traditional machine learning model, or an LLM-based agentic workflow.
Once a prototype is approved, you will work in close collaboration with the rest of the AI Platform Engineering team to translate your work into something that can scale into a production-grade application. You will not co-own productionisation and you will be a critical partner in making it successful.
This is a role for someone who is energised by ambiguity, moves fast without cutting corners, and knows how to make a compelling case for (or against) a technical approach based on evidence rather than enthusiasm.
Core Responsibilities
Business Discovery Run structured discovery sessions with department heads and product owners to identify and scope AI opportunities. Define a clear problem statement — including data availability and constraints — before any prototyping begins.
Rapid Prototyping Build functional POCs using the most appropriate approach for the problem: RAG pipelines, agentic workflows, predictive ML models, or rule-based systems. Prototypes must be credible enough to support a genuine build-or-not decision.
Stakeholder Management Act as the primary technical point of contact for business stakeholders throughout discovery and POC. Communicate trade-offs around accuracy, cost, and latency in plain terms — and be willing to recommend against building when the evidence calls for it.
Evaluation & Validation Define success criteria before building begins. Design and run evaluations appropriate to the POC type, and present findings clearly enough for a non-technical sponsor to make a confident go/no-go decision.
Technical Handoff Produce handoff documentation covering system design, prompt strategies, data requirements, known failure modes, and evaluation benchmarks — giving the AI Engineering team everything needed to take a validated POC into production.
Tech Stack & Technical Requirements
Core Languages & Frameworks
Proficiency in Python as the primary language for data science and ML development (Pandas, NumPy, Scikit-learn)
Familiarity with SQL for data querying and manipulation across modern data warehouses (e.g., BigQuery, Snowflake, PostgreSQL)
(Nice to have) Working knowledge of deep learning frameworks such as PyTorch or TensorFlow for model experimentation
LLM & Generative AI Tooling
Hands-on experience working with large language model APIs, including providers such as OpenAI, Anthropic, or Google
Strong command of prompt engineering techniques, including few-shot prompting, chain-of-thought reasoning, and structured output design
Experience with open-source LLMs (e.g., Mistral, LLaMA) and an understanding of when to apply open vs. proprietary models
Agentic Orchestration & RAG
Practical experience building RAG (Retrieval-Augmented Generation) pipelines, including chunking strategies, embedding models, and retrieval tuning
Familiarity with agentic orchestration frameworks such as LangChain, LangGraph, LlamaIndex, CrewAI, or AutoGen
Experience integrating vector databases (e.g., pgvector, Pinecone, Weaviate, ChromaDB) into search and retrieval workflows
Understanding of tool/function calling patterns for LLM-driven automation
Evaluation & Experimentation
Ability to define and implement "good enough" metrics and evaluation frameworks for POC validation
Experience with LLM evaluation libraries such as RAGAS, TruLens, or DeepEval
Familiarity with experiment tracking tools such as MLflow or Weights & Biases
Comfort with cost and latency profiling of LLM-based systems to inform feasibility decisions
Data & Infrastructure
Comfortable working within cloud environments (AWS, GCP, or Azure) for data access, compute, and API integration
Ability to integrate with REST APIs and third-party data sources during prototyping
Proficiency with standard development tools: Git, Jupyter notebooks, VS Code
Basic familiarity with Docker for packaging and sharing POC environments with engineering teams
Qualifications
Required Experience
4+ years of experience in data science, machine learning, or a closely related field, with a demonstrated track record of delivering end-to-end projects
2+ years of hands-on experience working with large language models or Generative AI solutions in a professional setting
Proven experience taking projects from business problem discovery through to a working prototype or proof of concept
Experience engaging directly with non-technical business stakeholders to gather requirements, set expectations, and communicate results clearly
Strong background in traditional ML approaches (classification, regression, clustering, NLP) alongside modern LLM-based methods
Education
Bachelor's degree in Computer Science, Statistics, Mathematics, Engineering, or a related quantitative field
A Master's or PhD is a plus, though equivalent industry experience is equally valued
Soft Skills & Ways of Working
Ability to translate complex technical outputs into clear business value — you are as comfortable in a boardroom as you are in a notebook
Strong stakeholder management skills, including the ability to set realistic expectations around LLM capabilities, limitations, and cost trade-offs
Excellent written communication skills for documenting prompt strategies, data requirements, and POC logic to enable clean technical handoffs
Self-directed with a high tolerance for ambiguity — you are energised by open-ended discovery, not slowed down by it
Structured thinker who can design evaluation criteria and define what "success" looks like before building begins
Nice to Have
Experience with fine-tuning or instruction-tuning LLMs on domain-specific datasets
Familiarity with responsible AI principles, including bias detection, fairness evaluation, and model transparency
Prior experience in a consulting, pre-sales engineering, or business-facing technical role
Knowledge of business process mapping (e.g., BPMN) to support structured discovery sessions