future-fit~29d ago

AI Inference Engineer

mid

OtherEngineer

4 views0 saves0 applied

Apply Now

Quick Summary

Overview

Job Description: AI Inference Engineer (vLLM and Kubernetes) 1. Role Overview & Strategic Context 1.1 Company Overview & Mission We are architecting the future of intelligent enterprise solutions globally.

Technical Tools

ansibleawsazurebashdockerfastapiflaskgcpgitlab-cigrafanajenkinskubernetesprometheuspythona11yci-cdconcurrencycybersecuritydocumentationlinuxmachine-learningmentoringnetworkingperformance-optimization

We are architecting the future of intelligent enterprise solutions globally. As we move into 2026, our mission has evolved to not only lead in digital transformation but to pioneer the deployment of sovereign, high-performance Artificial Intelligence. We believe that the true power of AI lies in its accessibility and operational efficiency. By leveraging cutting-edge open-source innovation and enterprise-grade infrastructure, we are building the platforms that will power the next generation of automated intelligence for our clients and our internal operations.

Our engineering culture is rooted in the principles of DevOps, SRE, and radical automation. We value engineers who are not merely "operators" but "architects of efficiency" who take immense pride in the stability, security, and performance of the systems they build. In an era where GPU resources are the new gold, our goal is to achieve world-class inference density and latency through meticulous systems engineering.

AI Inference Engineer (vLLM and Kubernetes)

Remote

Senior (5+ Years in Systems/DevOps Engineering)

Requirements

~1 min read

Portuguese and English

The 2026 landscape necessitates specialized inference engines. Candidates must have hands-on, production-level experience deploying and tuning vLLM. You should be able to explain and implement PagedAttention, continuous batching configurations, and KV cache sizing to maximize tokens-per-second throughput.

Success in this role requires treating the AI Inference Platform not just as a set of servers, but as a product served to our internal Data Science and Machine Learning teams. You must demonstrate the empathy to understand their needs and the professional rigor to deliver a platform that is reliable, performant, and easy to use.

RHEL, vLLM, NVIDIA GPU Operator, Kubernetes/OpenShift

The AI Inference Engineer (vLLM and Kubernetes) is a critical, highly specialized role sitting at the vanguard of the modern MLOps landscape. This position is designed for a senior engineer who possesses a unique hybrid of deep Red Hat Enterprise Linux (RHEL) systems administration expertise and modern AI infrastructure knowledge. Unlike traditional AI roles that focus on model training, your focus will be the "last mile" of AI: engineering the high-performance inference platforms that serve Large Language Models (LLMs) to end-users at scale.

Your primary objective is to build, secure, and automate an enterprise-grade inference environment. This involves orchestrating vLLM (or equivalent high-throughput engines) within Kubernetes clusters, ensuring that GPU resources are utilized at peak efficiency. You will be expected to treat infrastructure as code, using Ansible to manage the foundational RHEL layer and Python to glue together complex AI workflows.

In 2026, the competitive advantage of an organization is defined by its "Inference Velocity." Our goal is to reduce the cost of intelligence while maximizing output quality. We are looking for a proactive engineer who does not wait for a system alert to identify inefficiencies but actively hunts for bottlenecks in KV cache management, continuous batching performance, and GPU scheduling to refine our competitive edge.

Responsibilities

~1 min read

A primary accountability of this role is the elimination of technical toil. You will be expected to proactively identify manual processes in the model deployment lifecycle and engineer automated, self-healing solutions that reduce the "Time-to-Inference" for new model versions.

Nice to Have

~1 min read

Linux Systems

Red Hat Certified Engineer (RHCE) or Red Hat Certified Architect (RHCA)

Cloud Orchestration

Certified Kubernetes Administrator (CKA) or Red Hat Certified Specialist in OpenShift

Cloud Provider

AWS Certified Solutions Architect (Pro) or Azure Solutions Architect Expert

Technical mastery is the foundation of this role, but professional excellence at [Company Name] is defined by the non-technical attributes that enable a Senior Engineer to drive organizational change. In the high-stakes, rapidly evolving 2026 AI landscape, we seek a leader who combines technical depth with the cognitive agility and interpersonal skills required to navigate complex infrastructure challenges.

Director of AI Platforms / Head of MLOps

Senior Individual Contributor (Technical Lead Track)

Principal Systems Engineer (AI Infrastructure)

Rather than working in a centralized silo, the AI Inference Engineer operates within a matrixed "Squad" model. This structure facilitates high-velocity innovation by embedding the engineer directly into cross-functional delivery streams while maintaining strong ties to the core platform governance teams.

As a Subject Matter Expert, the AI Inference Engineer is expected to lead "Communities of Practice" within the engineering organization. You will be responsible for defining the "Golden Path" for inference deployment—setting the standards that other teams will follow when bringing intelligent services to market.

The AI Inference Engineer is empowered to make critical technical decisions regarding:

The selection and configuration of inference engines (e.g., vLLM vs. TGI).

The automation logic for RHEL system hardening and GPU driver lifecycles.

GPU resource allocation strategies within the Kubernetes scheduler.

Establishing the benchmarking protocols and performance KPIs for production AI services.