featherlessai
New

AI Researcher — Inference Optimization

(world)Remotefull-timemid
OtherAi Researcher
0 views0 saves0 applied

Quick Summary

Overview

Role Overview We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models.

Key Responsibilities

Research and develop techniques to optimize inference performance for large neural networks. Improve latency, throughput, memory efficiency, and cost per inference.

Requirements Summary

Strong background in machine learning, deep learning, or AI systems. Hands-on experience optimizing inference for large-scale models. Proficiency in Python and modern ML frameworks (e.g., PyTorch).

Technical Tools
pythonpytorchdeep-learningmachine-learning

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Responsibilities

~1 min read
  • Research and develop techniques to optimize inference performance for large neural networks.

  • Improve latency, throughput, memory efficiency, and cost per inference.

  • Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).

  • Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).

  • Benchmark inference workloads across hardware accelerators.

  • Collaborate with engineering teams to deploy optimized inference pipelines.

  • Translate research insights into production-ready improvements.

Requirements

~1 min read
  • Strong background in machine learning, deep learning, or AI systems.

  • Hands-on experience optimizing inference for large-scale models.

  • Proficiency in Python and modern ML frameworks (e.g., PyTorch).

  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).

  • Ability to design experiments and communicate results clearly.

Requirements

~1 min read
  • Experience deploying production inference systems at scale.

  • Familiarity with distributed and multi-GPU inference.

  • Experience contributing to open-source ML or inference frameworks.

  • Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.

  • Experience working close to hardware (CUDA, ROCm, profiling tools).

  • Measurable gains in latency, throughput, and cost efficiency.

  • Optimized inference systems running reliably in production.

  • Research ideas successfully translated into deployable systems.

  • Clear benchmarks and documentation that inform product decisions.

Nice to Have

~1 min read
  • Long-context inference optimization

  • Speculative decoding

  • KV-cache compression and paging

  • Efficient decoding strategies

  • Hardware-aware inference design

Location & Eligibility

Where is the job
Worldwide
Fully remote, anywhere in the world
Who can apply
Same as job location

Listing Details

Posted
January 23, 2026
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
23%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

featherlessaiAI Researcher — Inference Optimization