Applied Scientist (LLM)
Quick Summary
Retrieva
Our distributed team is looking for an experienced Applied Scientist with a strong background in Large Language models to develop high-performance Generative AI features across Cloud and Edge environments.
In this role you will drive the transition from research to production by optimizing local inference through model compression and quantization for private, real-time Edge performance, while also engineering scalable RAG architectures and multi-agent systems for Cloud deployment. Your daily responsibilities encompass the full research lifecycle, including formulating hypotheses, generating synthetic datasets, fine-tuning LLMs, and validating safety and alignment, ultimately culminating in technical reports.
Responsibilities
~1 min read- →Design and implement advanced methods in prompt orchestration, fine-tuning (SFT/RLHF/DPO), and autonomous agentic workflows
- →Curate high-quality training data from large-scale text and multi-modal sources
- →Identify patterns in model hallucinations and visualize evaluation metrics for clear interpretation
- →Tune hyperparameters and improve inference speed/accuracy through PEFT (LoRA/QLoRA) and advanced prompt engineering
- →Collaborate with Product and Data Engineering teams to seamlessly integrate LLM features into the broader ecosystem
- →Track and report progress using industry-standard benchmarks (MMLU, HumanEval, etc.) and custom internal KPIs
- →Stay at the forefront of the field (e.g., State Space Models, new Transformer variants) and evaluate cutting-edge techniques for production readiness
- →Engage in continuous technical growth and mentor junior colleagues to elevate the team's expertise
Requirements
~1 min read- 3+ years of commercial experience in Machine Learning, with a specific focus on the NLP or LLM domain
- Strong knowledge of Python3, NumPy, pandas, and modern text-processing libraries, PyTorch and Hugging Face (Transformers, PEFT, Accelerate)
- Proficiency in PEFT/LoRA and Reinforcement Learning techniques
- Deep understanding of attention mechanisms, tokenization, context window management, and embedding spaces
- Practical experience in at least one of the following: Retrieval-Augmented Generation (RAG), Fine-tuning, or Agentic frameworks
- Proven ability to manage and analyze massive datasets (>100GB) across text, image, and audio formats
- Hands-on experience crafting high-fidelity datasets and building robust data pipelines
- Expertise in prompt engineering, agentic framework design, and LLM pipeline orchestration
- Experience deploying LLMs to production environments using Triton Inference Server, vLLM, TGI, or ONNX
- Good written and spoken English
Nice to Have
~1 min read- Practical experience with Pinecone, Weaviate, Milvus, or Chroma
- Advanced quantization (GGUF, AWQ, EXL2), pruning, and knowledge distillation
- Experience with LangChain, LlamaIndex, or AutoGen
- Basic understanding of web/client-server architecture and streaming API responses (Asyncio, aiohttp)
- Familiarity with RAGAS, DeepEval, or G-Eval
- Experience using Docker, Kubernetes, and cloud GPU orchestration (e.g., Run:ai, Lambda Labs)
- Knowledge of C++, Triton, or CUDA for custom kernel development
What We Offer
~1 min readLocation & Eligibility
Listing Details
- Posted
- April 22, 2026
- First seen
- April 26, 2026
- Last seen
- May 4, 2026
Posting Health
- Days active
- 8
- Repost count
- 0
- Trust Level
- 30%
- Scored at
- May 4, 2026
Signal breakdown
Please let squad care know you found this job on Jobera.
4 other jobs at squad care
View all →Explore open roles at squad care.
Similar Data Scientist jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.
