LLM Inference Kernel Engineer MLA

WifiRemotemid
Software EngineeringKernel Engineer
0 views0 saves0 applied

Quick Summary

Overview

LLM Inference Kernel Engineer MLA Location: Remote, United States A high-growth, venture-backed AI innovator is pushing the boundaries of large-scale model performance, focusing on next-generation inference systems that operate at the intersection of model architecture and GPU execution.

Key Responsibilities

Design and implement high-performance GPU kernels tailored for large language model inference workloads Optimize CUDA kernels with a focus on memory efficiency, execution speed, and latency reduction Enhance token generation performance, KV cache…

Requirements Summary

Strong experience developing GPU kernels using CUDA C or C++ in performance-critical environments Hands-on experience optimizing inference workloads for large language models rather than purely research-based modeling Solid understanding of…

Technical Tools
cppmachine-learningperformance-optimization

Responsibilities

~1 min read
  • Design and implement high-performance GPU kernels tailored for large language model inference workloads
  • Optimize CUDA kernels with a focus on memory efficiency, execution speed, and latency reduction
  • Enhance token generation performance, KV cache utilization, and decoding efficiency in large-scale models
  • Collaborate on integrating optimized kernels into modern inference serving frameworks such as vLLM or similar systems
  • Work closely with a small, highly technical team to rapidly prototype, test, and deploy performance improvements
  • Apply advanced techniques such as kernel fusion, tiling strategies, and warp-level optimization to improve throughput
  • Translate complex attention mechanisms into production-ready, scalable GPU implementations
  • Strong experience developing GPU kernels using CUDA C or C++ in performance-critical environments
  • Hands-on experience optimizing inference workloads for large language models rather than purely research-based modeling
  • Solid understanding of attention mechanisms, with exposure to advanced implementations such as fused attention or similar approaches
  • Familiarity with modern inference stacks and serving frameworks
  • Deep knowledge of GPU architecture, including memory hierarchy, bandwidth constraints, and latency tradeoffs
  • Ability to operate in a fast-paced, highly iterative environment with minimal oversight

Requirements

~1 min read
  • Experience working with advanced attention techniques such as latent attention or similar architectures
  • Exposure to large-scale or distributed model inference environments, including mixture-of-experts systems
  • Contributions to performance optimization projects, open-source kernels, or inference tooling Familiarity with GPU profiling and performance analysis tools
  • Background that bridges model architecture, systems engineering, and deployment layers

This is not a traditional machine learning engineering position. The work sits at one of the most performance-critical layers in the AI stack, where low-level optimization directly impacts real-world model capability. You will have the opportunity to shape how advanced models operate at scale, contributing to meaningful innovations in inference performance and system efficiency.

Blue Signal is an award-winning, executive search firm specializing in various specialties. Our recruiters have a proven track record of placing top-tier talent across industry verticals, with deep expertise in numerous professional services. Learn more at bit.ly/46Gs4yS 



Location & Eligibility

Where is the job
Worldwide
Fully remote, anywhere in the world
Who can apply
Same as job location

Listing Details

First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
46%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

Blue-Signal-SearchLLM Inference Kernel Engineer MLA