Quick Summary
Be part of the team creating the software foundation for next-generation AI compute platforms. In this role, you’ll work across the full stack — from low-level kernels and hardware-optimized operators to large-scale ML deployment frameworks — in close collaboration with compiler developers, ML…
Be part of the team creating the software foundation for next-generation AI compute platforms. In this role, you’ll work across the full stack — from low-level kernels and hardware-optimized operators to large-scale ML deployment frameworks — in close collaboration with compiler developers, ML scientists, and hardware specialists. This position offers the chance to contribute to state-of-the-art AI infrastructure, fine-tune software for custom hardware, and deepen your expertise in system software and machine learning.
Responsibilities
~1 min read- →Design, develop, and maintain components of the deployment stack and software kernels for AI compute platforms
- →Optimize and implement core ML operators (e.g., GEMMs, convolutions, BLAS routines, SIMD kernels)
- →Translate computational graphs from ML frameworks onto the underlying hardware
- →Contribute to compiler infrastructure together with compiler and hardware teams
- →Investigate and resolve issues through system-level debugging and performance analysis
- →Deliver scalable software solutions under ambitious development schedules
- →Define and apply practices for testing, deployment, and scaling AI systems
- →
Requirements
~1 min read- Bachelor’s degree in Computer Science, Engineering, Mathematics, or related discipline, with 3+ years of professional software development experience
- Solid knowledge of computer architecture, system software, data structures
- Strong programming skills in C/C++ or Python in Linux environments using common development tools
- Hands-on experience implementing algorithms in high-level languages (C/C++/Python)
- Exposure to specialized hardware (GPUs, FPGAs, DSPs, AI accelerators) and frameworks such as OpenCL or CUDA
- Experience designing or working with high-performance software systems
- Solid knowledge of ML fundamentals
- Motivated team player with a strong sense of responsibility
- Model serving frameworks (e.g., Triton Inference Server, DeepSpeed Inference, vLLM)
- Deep learning frameworks (e.g., PyTorch, TensorFlow)
- ML runtimes (e.g., ONNX Runtime, TVM, IREE, XLA)
- Distributed collectives (e.g., Gloo, MPI)
- Software testing and validation methodologies
- Deploying ML workloads (LLMs, VLMs, NLP, etc.) across distributed systems
- Implementation of ML operators and kernels (e.g., SIMD routines, Activation functions, Pooling layers, Quantization layers)
- Hardware-aware optimizations and performance tuning
- 2+ years of experience developing software targeting AI hardware
Contribution to open-source projects (e.g., LLVM, PyTorch, TensorFlow, ONNX Runtime, xDSL, IREE) is a big plus.
Location & Eligibility
Listing Details
- Posted
- April 30, 2026
- First seen
- May 5, 2026
- Last seen
- May 7, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 28%
- Scored at
- May 6, 2026
Signal breakdown
Please let htecgroup know you found this job on Jobera.
4 other jobs at htecgroup
View all →Explore open roles at htecgroup.
Similar Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.