Member of Technical Staff - ML Infrastructure & Performance
Quick Summary
Introducing Moonlake, AI for creating real-time interactive content Mission: Improve Throughput, Latency, & Cost - deploying our models 2–10× faster & cheaper without quality regressions. Scope of Work: - GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.
Introducing Moonlake, AI for creating real-time interactive content
- GPU performance: CUDA/Triton kernels, FlashAttention family, paged attention, CUDA Graphs.
- Serving stack: TensorRT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.
- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.
- Quantization/PEFT: AWQ/GPTQ/FP8; LoRA/DoRA serving.
- Systems: Ray/k8s/Argo, observability (Prom/Grafana/OpenTelemetry), autoscaling, A/B infra, canary + rollback.
Previous experience at Infra-heavy startups such as Databricks, Roblox
We are committed to being an on-site, in-person team currently based in San Mateo
Location & Eligibility
Listing Details
- Posted
- December 12, 2025
- First seen
- May 6, 2026
- Last seen
- May 8, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 12%
- Scored at
- May 6, 2026
Signal breakdown
Please let embedding-vc know you found this job on Jobera.
4 other jobs at embedding-vc
View all →Explore open roles at embedding-vc.
Similar Member Of Technical Staff jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.