Research Engineer, Model Inference & Serving - London
Quick Summary
Research Engineer, Model Inference & Serving About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.
Build and operate the inference stack that serves H's multimodal agentic models Improve latency, throughput, and cost of model serving across the stack Research and implement inference techniques tailored to agent workloads Co-design with the Models…
Technical skills: Strong software engineering track record Proficient in Python and at least one systems language (Rust, C++, or Go) Hands-on experience with deep learning frameworks (PyTorch, JAX), preferably in an industry setting Solid…
About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential. H is hiring the world's best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning, and collaboration, where everyone has something to contribute.
About the Team: The Inference team builds and operates the systems that serve H's foundational models in production. We focus on multimodal inference and serving for Computer Use Agents, optimizing across both the inference engine layer (e.g., vLLM, SGLang) and the model serving layer (e.g., disaggregated inference, intelligent routing). Agentic inference brings constraints around context length, multimodality, and tool calls, which we address by co-designing with the Models team on training-time choices and with the agent teams on how models are deployed. We operate at the intersection of research and production, translating cutting-edge inference techniques into the systems that power H's next generation of agents. We are looking for strong engineers excited about inference to join the team and help shape the systems behind superintelligent AI.
Responsibilities
~1 min read- →
Build and operate the inference stack that serves H's multimodal agentic models
- →
Improve latency, throughput, and cost of model serving across the stack
- →
Research and implement inference techniques tailored to agent workloads
- →
Co-design with the Models team on training-time decisions that affect inference
- →
Collaborate with cross-functional teams to integrate inference into agentic AI products
- →
Evaluate inference, serving, and hardware platforms, and communicate findings to stakeholders
- →
Stay current with advancements in inference, model serving, and accelerator technology
Requirements
~1 min readTechnical skills:
Strong software engineering track record
Proficient in Python and at least one systems language (Rust, C++, or Go)
Hands-on experience with deep learning frameworks (PyTorch, JAX), preferably in an industry setting
Solid distributed systems fundamentals
Experience working in a modern cloud environment and with production ML infrastructure (Kubernetes, etc.)
Working knowledge of modern ML, including transformers and multimodal architectures
Research skills:
Research engagement: an advanced degree with research output, or publications at top-tier AI or systems venues (e.g., NeurIPS, ICML, MLSys, OSDI), research internships, or substantive open-source contributions
Soft skills:
Excellent communication and presentation skills
Strong collaboration and teamwork skills
Passion for inference and AI
Preferred qualifications:
Startup experience
Hands-on experience with inference frameworks (vLLM, SGLang, TensorRT-LLM)
Writing or modifying GPU kernels (CUDA, Triton, etc.)
Edge or on-device inference experience (llama.cpp, MLX, ONNX Runtime, etc.)
Experience with quantization, speculative decoding, disaggregated inference or KV-cache compression
Experience with multimodal models and/or agentic systems
Paris or London.
This role is hybrid, and you are expected to be in the office 3 days a week on average.
Please expect some travel between offices on a reasonable cadence (e.g., every 4-6 weeks).
What We Offer
~1 min readLocation & Eligibility
Listing Details
- Posted
- April 10, 2026
- First seen
- May 6, 2026
- Last seen
- June 4, 2026
Posting Health
- Days active
- 28
- Repost count
- 0
- Trust Level
- 18%
- Scored at
- June 4, 2026
Signal breakdown
Please let hcompany know you found this job on Jobera.
4 other jobs at hcompany
View all →Explore open roles at hcompany.
Similar Research Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.