From $85/yr

GPU Cluster Architect

United StatesUnited StatesRemotemid
OtherDevOps & InfrastructureArchitectConstruction & Real Estate
0 views0 saves0 applied

Quick Summary

Key Responsibilities

Cluster Design : Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes. Performance Modeling : Analyze AI/ML workloads (e.

Requirements Summary

Cluster Design : Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes. Performance Modeling : Analyze AI/ML w

Technical Tools
OtherDevOps & InfrastructureArchitectConstruction & Real Estate

Responsibilities

~1 min read
  • Cluster Design: Architect scalable GPU cluster topologies including compute nodes, interconnect (InfiniBand, Ethernet), storage, and control planes.
  • Performance Modeling: Analyze AI/ML workloads (e.g. LLM training, inference) to inform design tradeoffs across latency, bandwidth, and GPU density.
  • Network Architecture: Align with network architect relevant design and validate low-latency, high-throughput interconnects (e.g., InfiniBand HDR/NDR, RoCEv2) at POD and DC scale.
  • Storage Integration: Work with storage teams to optimize performance for training datasets, checkpointing, and others.
  • Reliability & Monitoring: Understand and analyze signal from monitoring systems to the detect flows in design
  • Collaboration: Partner with site reliability, networking, storage, and DC engineering teams to operationalize and scale your architecture.
  • 5+ years of experience designing clusters.
  • Deep understanding of modern GPU architecture (NVIDIA, AMD, etc.).
  • Experience with HPC interconnects (InfiniBand & RoCE).
  • Solid background in systems architecture, networking, and hardware reliability.
  •  Experience in scripting for automation and telemetry pipelines (Python, Go, etc.)

What We Offer

~1 min read
Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
401(k) plan: Up to 4% company match with immediate vesting.
Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
Remote work reimbursement: Up to $85/month for mobile and internet.
Disability & life insurance: Company-paid short-term, long-term and life insurance coverage.

What We Offer

~1 min read

We offer competitive salaries ranging from $184K to $318K OTE, which includes base salary and performance bonus. Equity in the form of RSUs may be available at certain salary grades.

What We Offer

~1 min read
Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Hybrid working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.

What We Offer

~1 min read
Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.

Listing Details

First seen
April 3, 2026
Last seen
April 26, 2026

Posting Health

Days active
23
Repost count
0
Trust Level
51%
Scored at
April 26, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Nebius
Nebius
greenhouse

Nebius is a cutting-edge AI cloud platform that offers scalable infrastructure for developing and deploying AI solutions.

Employees
350
Founded
2022
View company profile
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

NebiusGPU Cluster ArchitectFrom $0k