Senior AI Engineer — Inference & Agent Systems

BangaloreHybridsenior
Data ScienceMachine Learning EngineerAI EngineerDataData & AI
8 views0 saves0 applied

Quick Summary

Requirements Summary

first token to user while sub-agents are still running - KV cache strategy, prompt compression, dynamic context window management - Multi-provider routing: model selection by latency, cost,

Technical Tools
Data ScienceMachine Learning EngineerAI EngineerDataData & AI

Arcana is building AI agents that synthesize information across heterogeneous sources and deliver structured, reasoned answers in real time. The product only works if the agents are fast, reliable, and correct, not approximately correct.

Our stack: Go + Temporal for orchestration, a Plan-Execute-Synthesize agent architecture, and an evaluation harness we use to measure every regression. The problems are hard. The latency bar is aggressive. The accuracy requirements are unforgiving.

- Drive TTFT below 400ms for multi-step agent pipelines

- Streaming optimization: first token to user while sub-agents are still running

- KV cache strategy, prompt compression, dynamic context window management

- Multi-provider routing: model selection by latency, cost, and task type across OpenAI, Anthropic, Gemini, and open-weight models

- Design and implement Plan-Execute-Synthesize pipelines that run sub-agents in parallel DAGs, not sequential chains

- Build reliable orchestration on top of Temporal: retries, timeouts, partial failure recovery, idempotency

- Structured output enforcement: JSON schema validation, retry loops on malformed LLM output, graceful degradation

- Tool call design: schema design that LLMs actually follow reliably across providers

- Own the eval framework end to end: ground truth datasets, automated scoring pipelines, regression detection on every PR

- LLM-as-judge pipelines for qualitative output assessment

- Latency regression testing - p50/p95/p99 tracked across every deployment

- Adversarial test case design: ambiguous queries, missing data, conflicting sources, malformed tool responses

- Model serving and cold start optimization

- Async worker architecture for parallel sub-agent execution

- Observability: trace every token, every tool call, every synthesis step

You've built something that runs in production at a meaningful scale and you understand why it's fast (or why it isn't).

- You've fine-tuned models but haven't shipped inference systems

- You've used LangChain/LlamaIndex but haven't built the layer underneath

- Strong ML research background without systems exposure

The problems here don't have blog posts about them yet. Parallel agent DAG execution under hard latency budgets, streaming synthesis across partial sub-agent results, eval harnesses for non-deterministic multi-step systems: these are genuinely unsolved at production quality. Small team. High ownership. Every engineer's decisions ship to production.

You've shipped inference systems at:

- A real-time AI product (search, coding assistant, chat at scale)

- A model serving infrastructure company

- An agent platform (any domain)

Or you've built eval/harness infrastructure that a team of 10+ engineers actually trusted to catch regressions.

Send to: [careers@arcana.io]

  1. One system you built where latency was the primary constraint what you measured, what you changed, what moved
  2. Link to anything public (code, writing, talks)
  3. No cover letter required

We respond to every application.

Location & Eligibility

Where is the job
Location terms not specified
Who can apply
Same as job location
Listed under
Worldwide

Listing Details

Posted
March 16, 2026
First seen
March 26, 2026
Last seen
May 4, 2026

Posting Health

Days active
39
Repost count
0
Trust Level
24%
Scored at
May 5, 2026

Signal breakdown

freshnesssource trustcontent trustemployer trust
Newsletter

Stay ahead of the market

Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.

A
B
C
D
Join 12,000+ marketers

No spam. Unsubscribe at any time.

A
Senior AI Engineer — Inference & Agent Systems