$290,000 – $310,000/yr

Director, Support Engineering

San Franciscoexecutive
OtherDirector
0 views0 saves0 applied

Quick Summary

Key Responsibilities

1s, team reviews, and escalation retrospectives. Operationalization and Scaling Assess and overhaul support workflows, SLA frameworks, and escalation playbooks Build triage, prioritization,

Technical Tools
OtherDirector

About the Role

~1 min read

We’re hiring a Support Leader to own and scale Together AI’s customer support function across two distinct, technically demanding domains: API Support (billing, serverless inference, and dedicated inference) and GPU Support (large-scale GPU infrastructure for model training workloads). You’ll work closely with Together AI’s VP of Customer Experience and partner tightly with SRE, Inference Platform, and Engineering to represent customers internally and drive resolution at speed. This is a player-coach role: you’ll be hands-on in escalations. 

Our support operation runs 24/7. Our GPU infrastructure customers hold us to high-stakes SLAs on training workloads. Our API customer base spans thousands of PLG and enterprise accounts relying on our serverless and dedicated inference endpoints. Both domains need a leader who can keep pace technically and build the operational muscle to scale.

Responsibilities

~1 min read
  • Directly manage and develop a team of support engineers and technical account specialists across API Support and GPU Support functions.
  • Establish clear performance expectations, career growth paths, and a coaching culture leveraged to identify skill gaps and build training programs to close them.
  • Run structured 1:1s, team reviews, and escalation retrospectives.
  • Assess and overhaul support workflows, SLA frameworks, and escalation playbooks
  • Build triage, prioritization, and handoff protocols that allow the team to scale with customer growth without proportional headcount growth.
  • Define and own support KPIs: SLA attainment, time-to-resolution, escalation rate, CSAT
  • Jump into complex, active GPU infrastructure issues alongside your team. Investigate NCCL and InfiniBand failures, SSH connection stalls, Kubelet TLS misconfigurations, GPU/RDMA provisioning timeouts, NFS RDMA mount failures, VAST storage failures, network fabric degradation, etc.
  • Manage high-stakes SLA obligations with GPU cloud customers running multi-thousand-GPU training workloads
  • Coordinate closely with SRE and infrastructure engineering on hardware-level issues and cluster bringup.
  • Own the support surface for Together AI’s API platform: serverless inference, dedicated inference endpoints (self-serve and managed), billing, rate limits, model upload (BYOM), and API authentication.
  • Represent the team on complex cases: dedicated endpoint startup failures, safetensors validation errors, NFS/storage performance issues on inference clusters, billing disputes and negative-balance enforcement, and rate limit escalations.
  • Work with the Inference Platform, Commerce, and Product teams to surface patterns and drive fixes upstream.
  • Be the escalation point for your team’s highest-severity customer issues — triage fast, communicate clearly to customers and internal stakeholders, and drive to resolution.
  • Partner with SRE, Engineering, and Sales on shared priorities. Represent the support team’s perspective in cross-functional planning.
  • Own the relationship with support tooling vendors and drive improvements to alerting, SLA tracking, and ticket routing.
  • Systematically analyze ticket patterns and surface product and infrastructure gaps to Engineering and Product. Turn support signal into actionable roadmap input.
  • Build documentation and self-service resources that reduce inbound volume over time.

Requirements

~1 min read
  • 10+ years of support engineering or technical support leadership experience, with at least 3 years managing a team.
  • Demonstrated experience leading infrastructure support or cloud operations. You understand how large-scale workloads behave on distributed systems. 
  • Working knowledge of AI infrastructure. You know how APIs work, can reason about latency and throughput issues, and understand the operational surface of a managed inference platform.
  • Technical depth to be a credible player-coach. Ability to guide engineers through root cause analysis, and bring credibility to customer-facing escalations.
  • Experience running SLA-driven support operations with real accountability. Familiarity with Pylon or equivalent support ticketing platforms (Zendesk, etc.) and PagerDuty-style alerting systems.
  • Strong communication skills, especially under pressure. You can write a clear, concise customer-facing update in the middle of a live incident and distill a complex infrastructure issue into a crisp internal escalation.
  • Startup mindset. You’re comfortable building process where none exists, and you thrive in environments where priorities shift fast.

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society, and together we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI, and our team has been behind technological advancement such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers in our journey in building the next generation AI infrastructure. 

What We Offer

~1 min read

We offer competitive compensation, startup equity, health insurance, and other benefits, as well as flexibility in terms of remote work. The US base salary range for this full-time position is: $290,000 - $310,000K + equity + benefits. Our salary ranges are determined by location, level and role. Individual compensation will be determined by experience, skills, and job-related knowledge. 

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. Please see our Privacy Policy at https://www.together.ai/privacy

 

Location & Eligibility

Where is the job
San Francisco
On-site at the office
Who can apply
Same as job location

Listing Details

Posted
April 28, 2026
First seen
April 28, 2026
Last seen
May 3, 2026

Posting Health

Days active
5
Repost count
0
Trust Level
56%
Scored at
May 3, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Togetherai
Togetherai
greenhouse
Employees
30
Founded
2021
View company profile

3 other jobs at Togetherai

View all →

Explore open roles at Togetherai.

Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

TogetheraiDirector, Support Engineering $290k–$310k