embedding-vc7mo ago

Member of Technical Staff - ML Infrastructure & Performance

United States·San Mateofull-timelead

OtherMember Of Technical Staff

1 views0 saves0 applied

Quick Summary

Overview

Introducing Moonlake, AI for creating real-time interactive content Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions. Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

Technical Tools

grafanakubernetes

Introducing Moonlake, AI for creating real-time interactive content

- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.

- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.

- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo

Location & Eligibility

Where is the job

San Mateo, United States

On-site at the office

Who can apply

US

Listing Details

Posted: December 12, 2025
First seen: May 6, 2026
Last seen: July 20, 2026

Posting Health

Days active: 75
Repost count: 0
Trust Level: 13%
Scored at: July 21, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust

Apply for this position

ashby

Domain

jobs.ashbyhq.com

Jobs

View company profile

External application · ~5 min on embedding-vc's site

Please let embedding-vc know you found this job on Jobera.

4 other jobs at embedding-vc

Explore open roles at embedding-vc.

AI 数据平台产品经理｜标注 / 评测方向

Product Marketing Lead

Similar Member Of Technical Staff jobs

Member of Technical Staff, Principal Biostatistician

Member of Technical Staff — ML Research, Interpretability

Member of Technical Staff — Product Engineering

Member of Technical Staff — ML Research, Multimodal

Member of Technical Staff — Inference Infrastructure

Member of Technical Staff — Compute Cluster

Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A

B

C

D

Join 12,000+ marketers

No spam. Unsubscribe at any time.

Member of Technical Staff - ML Infrastructure & Performance