Principal AI Researcher (Agentic Systems & AI Infrastructure)
Quick Summary
Red Cell Partners is an incubation firm building and investing in rapidly scalable technology-led companies that are bringing revolutionary advancements to market in three distinct practice areas: healthcare, cyber, and national security. United by a shared sense of duty and deep belief in the power of innovation, Red Cell is developing powerful tools and solutions to address our Nation’s most pressing problems.
Co-founded in 2023 by Joe Laws and Grant Verstandig, Trase Systems is AI, Uncomplicated. Trase empowers enterprise leaders to harness the full potential of AI without the associated complexity and risks. We are an end-to-end solution for deploying, managing, and optimizing AI in the enterprise. Our platform specializes in bridging the “last mile” of AI adoption, unlocking AI's full potential while driving efficiency and significant cost savings. Trase is at the forefront of AI Agent innovation, topping the Hugging Face GAIA Leaderboard for Generalized AI Assistants, ahead of industry giants such as Google, Meta, Microsoft, and OpenAI. We are leveraging our cutting-edge technologies to develop mission-critical agentic applications in complex industries such as Healthcare, Oil & Gas, and National Security.
About the Role
~1 min readAs a Principal AI Researcher, you will define and drive the long-term research direction for the Trase operating system, the agentic execution platform powering autonomous systems in regulated environments. This role sits at the intersection of frontier AI research, agentic systems, orchestration infrastructure, and production deployment, with a focus on how models behave inside real-world execution environments rather than solely on offline benchmark performance.
You will lead research across areas such as agent workflows, tool use, long-lived execution, orchestration, and autonomous system reliability, while conducting large-scale experimentation and advancing novel approaches in applied AI systems.
This is a hands-on technical leadership role operating across research, systems, and product. You will drive technical breakthroughs in agentic infrastructure and applied AI systems, own the end-to-end research-to-production lifecycle, and work closely with engineering and product teams to translate frontier research into scalable, production-grade systems deployed across Trase.
Trase OS coordinates long-lived agents, tool-augmented LLMs, multi-agent workflows, and execution in regulated enterprise environments. As these systems scale, the core challenge shifts from raw model capability to system correctness, orchestration reliability, infrastructure governance, and safe autonomous execution.
We are particularly interested in candidates with expertise or research interest in areas such as:
- agent-to-agent learning,
- orchestration and harness engineering,
- infrastructure governance for AI operating systems,
- long-lived execution and memory systems,
- SLMs (small language models), model optimization, and fine-tuning recipes,
- post-training adaptation techniques and model behavior shaping,
- and evaluation frameworks for autonomous agents.
This role will help define how next-generation AI systems are researched, evaluated, and safely operated in production.
Responsibilities
~2 min read- →Define and evolve the long-term AI/ML research strategy and technical roadmap for Trase OS in alignment with product and platform direction.
- →Lead large-scale experimentation and prototyping efforts requiring significant compute infrastructure, translating frontier AI research into scalable, production-grade systems with measurable impact.
- →Drive original research and technical breakthroughs in agentic systems, autonomous execution, multi-agent orchestration, post-training and fine-tuning systems, SLM/LLM-based architectures, and applied AI infrastructure.
- →Design how models operate within long-lived execution environments, including agent workflows, tool use, planning, memory systems, reasoning, and human-in-the-loop controls.
- →Establish evaluation methodologies and reliability frameworks for autonomous systems, including benchmarking, regression testing, safety, controllability, and production behavior analysis.
- →Drive architecture decisions across orchestration, model serving, routing, inference, and infrastructure governance, including latency, reliability, and cost optimization.
- →Partner closely with engineering and product teams to operationalize research outcomes into deployable systems and enterprise workflows.
- →Build AI systems that operate reliably in regulated and constrained environments, including secure cloud, on-premise, and air-gapped deployments.
- →Contribute to the broader AI research community through technical papers, publications, conference participation, architecture proposals, and thought leadership.
- →Serve as a senior technical authority and mentor across the organization, influencing technical direction, research rigor, experimentation practices, and best practices across research, engineering, and product teams.
Requirements
~2 min read- 12–15+ years of experience in machine learning, AI systems, or applied AI research, including experience operating at a Principal, Distinguished, or equivalent technical level.
- Strong research and publication track record, including authored papers, major technical contributions, or active participation in frontier AI research.
- Experience publishing at top-tier conferences or contributing influential open-source, research, or AI infrastructure systems.
- Experience conducting large-scale experimentation requiring significant compute infrastructure, evaluation workflows, and iterative model/system analysis.
- Deep expertise in one or more areas including agentic systems, LLMs and generative AI, multi-agent systems, reasoning systems, reinforcement learning, orchestration infrastructure, AI systems reliability, NLP, multimodal systems, or deep learning.
- Hands-on experience with agent-based systems, prompt engineering, RAG, RLHF, SLMs, fine-tuning/post-training techniques, tool integration, memory systems, and human-in-the-loop orchestration.
- Proven experience building, deploying, and operating enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
- Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, observability, evaluation frameworks, regression testing, and failure modes.
- Strong systems thinking and demonstrated ability to partner cross-functionally with engineering and product organizations to move research into production systems.
- Strong programming and prototyping skills in Python and modern ML infrastructure stacks, with experience in Java or related systems languages preferred.
- Experience deploying AI/ML systems in regulated, constrained, or enterprise environments, and demonstrated ability to lead technical direction from research through production impact.
- PhD in Computer Science, Machine Learning, AI, Systems, or a related field.
- Experience building and operating AI/ML platforms supporting the full model lifecycle, including training, evaluation, deployment, and monitoring.
- Experience optimizing ML inference or orchestration systems in real-time, distributed, or resource-constrained environments.
What We Offer
~3 min readLocation & Eligibility
Listing Details
- Posted
- May 21, 2026
- First seen
- May 21, 2026
- Last seen
- May 22, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 68%
- Scored at
- May 21, 2026
Signal breakdown
Please let Redcellpartners know you found this job on Jobera.
4 other jobs at Redcellpartners
View all →Explore open roles at Redcellpartners.
Similar Ai Researcher jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.