htecgroup
htecgroup28d ago

AI Inference Engineer

OtherEngineer
0 views0 saves0 applied

Quick Summary

Overview

Be part of the team creating the software foundation for next-generation AI compute platforms. In this role, you’ll work across the full stack — from low-level kernels and hardware-optimized operators to large-scale ML deployment frameworks — in close collaboration with compiler developers, ML…

Technical Tools
cpppythonpytorchtensorflowdistributed-systemslinuxmachine-learningperformance-optimization

Be part of the team creating the software foundation for next-generation AI compute platforms. In this role, you’ll work across the full stack — from low-level kernels and hardware-optimized operators to large-scale ML deployment frameworks — in close collaboration with compiler developers, ML scientists, and hardware specialists. This position offers the chance to contribute to state-of-the-art AI infrastructure, fine-tune software for custom hardware, and deepen your expertise in system software and machine learning. 

How You’ll Contribute:

  • Build and optimize inference pipelines for large-scale model serving (LLMs and beyond)  
  • Work with frameworks like PyTorch, TensorRT, and vLLM to deploy models efficiently  
  • Implement and optimize ML models using techniques such as quantization (INT8/FP8), kernel fusion, and efficient batching 
  • Optimize and implement core ML operators (e.g., GEMMs, convolutions, activations, ...) 
  • Investigate and resolve issues through system-level debugging and performance analysis 
  • Define and apply practices for testing, deployment, and scaling AI systems
     

Required skills: 

  • BSc/MSc in Computer Science, Engineering, Mathematics, or related discipline 
  • Strong programming skills in C/C++ or Python in Linux environments using common development tools 
  • Solid knowledge of computer architecture, system software, data structures 
  • Hands-on experience implementing algorithms in high-level languages (C/C++/Python) 
  • Exposure to specialized hardware (GPUs, FPGAs, DSPs, AI accelerators) and frameworks such as OpenCL or CUDA 
  • Experience designing or working with high-performance software systems 
  • Solid knowledge of ML fundamentals 
  • Motivated team player with a strong sense of responsibility

 

You are a great fit if you have experience in at least one of the following areas: 

  • Model serving frameworks (e.g., Triton Inference Server, DeepSpeed Inference, vLLM) 
  • ML runtimes (e.g., ONNX Runtime, TVM, IREE, XLA) 
  • Deploying ML workloads (LLMs, VLMs, NLP, etc.) across distributed systems 
  • Implement and optimize ML operators and kernels with a focus on vectorization and efficient execution (e.g., activation, pooling, quantization) 
  • Hardware-aware optimizations and performance tuning 
  • 2+ years of experience developing software targeting AI hardware 


Contribution to open-source projects (e.g., LLVM/MLIR, PyTorch, TensorFlow, ONNX Runtime, xDSL, IREE) is a big plus.

Location & Eligibility

Where is the job
Serbia
On-site within the country
Who can apply
RS

Listing Details

Posted
April 30, 2026
First seen
May 5, 2026
Last seen
May 28, 2026

Posting Health

Days active
22
Repost count
0
Trust Level
14%
Scored at
May 28, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

htecgroupAI Inference Engineer