Hyphenconnect1mo ago

LLM Pre-training & Distributed Engineer (AI Infrastructure)

United States·Bostonmid

OtherEngineer

3 views0 saves0 applied

Quick Summary

Overview

We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure.

Key Responsibilities

Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM. Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.

Technical Tools

cppkubernetespythonpytorchdistributed-systemsmachine-learningnetworking

We are seeking a highly skilled LLM Pre-training & Distributed Systems Engineer. This role is essential for orchestrating large-scale machine learning training runs and optimizing distributed infrastructure. The ideal candidate will have a deep understanding of GPU clusters and extensive experience in system engineering to ensure efficient and reliable training processes.

Responsibilities

~1 min read

→Orchestrate distributed training runs across 1,000+ GPUs using PyTorch, DeepSpeed, or Megatron-LM.
→Optimize networking (InfiniBand/RDMA) and memory management to prevent out-of-memory errors.
→Automate checkpointing and failure recovery during month-long training runs.

Deep expertise in 3D parallelism (Data, Tensor, Pipeline).
Experience managing SLURM or Kubernetes-based GPU clusters.
Strong systems engineering background (C++, CUDA, Python).

Location & Eligibility

Where is the job

Boston, United States

On-site at the office

Who can apply

US

Listed under

United States

Listing Details

Posted: April 24, 2026
First seen: April 24, 2026
Last seen: May 28, 2026

Posting Health

Days active: 33
Repost count: 0
Trust Level: 21%
Scored at: May 28, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust

Apply for this position

greenhouse

Web3 and AI talent recruitment agency based in Hong Kong with 700+ placements globally

Domain

hyphen-connect.com

Jobs

View company profile

External application · ~5 min on Hyphenconnect's site

Please let Hyphenconnect know you found this job on Jobera.

4 other jobs at Hyphenconnect

Explore open roles at Hyphenconnect.

Founding Mobile Development Lead (Game/ React Native/ Mandarin speaking)

Ecosystem BD (DeFi/ L1) - APAC remote

CMO (Web3) - APAC remote

Treasury & Trade Operations Manager (TradFi/ DeFi) - HK/SZ Onsite

Similar Engineer jobs

Senior Engineer, Server – NBA 2K

Senior Engineer I, Field Process

Networking Engineer

2nd Engineer for Pure Car and Truck Carrier with DF LNG Engine Type (Hiring Offices: Ukraine / Europe / Manila)

Convergentresearch

Bioelectronics Engineer, Wetware & Biosensor Integration

USD 95000–150000

Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A

B

C

D

Join 12,000+ marketers

No spam. Unsubscribe at any time.

LLM Pre-training & Distributed Engineer (AI Infrastructure)