Member of technical staff (Inference) - Paris
Quick Summary
About H:H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.
Develop scalable, low-latency and cost effective inference pipelines Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms…
Technical skills: MS or PhD in Computer Science, Machine Learning or related fields Proficient in at least one of the following programming languages: Python, Rust or C/C++ Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc.
Responsibilities
~1 min read- →
Develop scalable, low-latency and cost effective inference pipelines
- →
Optimize model performance: memory usage, throughput, and latency, using advanced techniques like distributed computing, model compression, quantization and caching mechanisms
- →
Develop specialized GPU kernels for performance-critical tasks like attention mechanisms, matrix multiplications, etc.
- →
Collaborate with H research teams on model architectures to enhance efficiency during inference
- →
Review state-of-the-art papers to improve memory usage, throughput and latency (Flash attention, Paged Attention, Continuous batching, etc.)
- →
Prioritize and implement state-of-the-art inference techniques
Requirements
~1 min readTechnical skills:
MS or PhD in Computer Science, Machine Learning or related fields
Proficient in at least one of the following programming languages: Python, Rust or C/C++
Experience in GPU programming such as CUDA, Open AI Triton, Metal, etc.
Experience in model compression and quantization techniques
Soft skills
Collaborative mindset, thriving in dynamic, multidisciplinary teams
Strong communication and presentation skills
Eager to explore new challenges
Bonuses:
Experience with LLM serving frameworks such as vLLM, TensorRT-LLM, SGLang, llama.cpp, etc.
Experience with CUDA kernel programming and NCCL
Experience in deep learning inference framework (Pytorch/execuTorch, ONNX Runtime, GGML, etc.)
Paris or London.
This role is hybrid, and you are expected to be in the office 3 days a week on average.
The final decision for this will lie with the hiring manager for each individual role
What We Offer
~1 min readLocation & Eligibility
Listing Details
- Posted
- April 14, 2026
- First seen
- May 6, 2026
- Last seen
- May 9, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 16%
- Scored at
- May 6, 2026
Signal breakdown
Please let hcompany know you found this job on Jobera.
4 other jobs at hcompany
View all →Explore open roles at hcompany.
Similar Member Of Technical Staff jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.