embedding-vc
New

Member of Technical Staff - ML Infrastructure & Performance

United StatesUnited States·San Mateofull-timelead
OtherMember Of Technical Staff
0 views0 saves0 applied

Quick Summary

Overview

Introducing Moonlake, AI for creating real-time interactive content Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions. Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

Technical Tools
grafanakubernetes

Introducing Moonlake, AI for creating real-time interactive content

- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.

- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo

Location & Eligibility

Where is the job
San Mateo, United States
On-site at the office
Who can apply
US

Listing Details

Posted
December 12, 2025
First seen
May 6, 2026
Last seen
May 8, 2026

Posting Health

Days active
0
Repost count
0
Trust Level
12%
Scored at
May 6, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

embedding-vcMember of Technical Staff - ML Infrastructure & Performance