Senior Software Engineer, Agentic Systems
Quick Summary
the part of the system that decides what to probe, forms and tests hypotheses, exploits, and verifies, without false positives and without touching anything it shouldn't. This is a build role,
Horizon3.ai is a fast-growing, remote cybersecurity company dedicated to the mission of enabling organizations to proactively find, fix, and verify exploitable attack vectors before criminals exploit them. Our flagship product, the NodeZero™ platform, delivers production-safe autonomous pentests and other key assessment operations that scale across the largest internal, external, cloud, and hybrid cloud environments. NodeZero has been adopted by organizations of all sizes, from small educational institutions to government agencies and Global 100 enterprises. It is used by ITOps/SecOps teams, consulting pentesters, and MSSPs and MSPs.
We are a fusion of former U.S. Special Operations cyber operators, startup engineers, and formerly frustrated cybersecurity practitioners. We're committed to helping solve our common security problems: ineffective security tools, false positives resulting in alert fatigue, blind spots, "checkbox" security culture, the cybersecurity skills shortage, and the long lead time and expense of hiring outside consultants. Collectively, we are a team of learn-it-alls, committed to a culture of respect, collaboration, ownership, and results.
We're building an autonomous, black-box web application penetration tester. It crawls and attacks real production websites the way a skilled human pentester would, finding broken access control, injection, XSS, SSRF, SSTI, and more, under a strict production-safe, no-false-positives mandate.
We have deep offensive expertise on this team: people who know exactly how to find and exploit these vulnerabilities by hand. What we need is an engineer who can turn that expertise into autonomous agent capability, the reasoning, orchestration, tooling, and evaluation that lets an LLM-driven agent do this work reliably, at scale, and unattended. You'd own and evolve the attack-agent layer: the part of the system that decides what to probe, forms and tests hypotheses, exploits, and verifies, without false positives and without touching anything it shouldn't.
This is a build role, not a research role. We use models surgically, deterministic-first, LLM-as-scalpel, and the hard problems are in engineering reliability, not chasing benchmarks.
Build and evolve the agent harness and orchestration that turns an LLM into a reliable autonomous pentester, the loop that reasons over an application, forms attack hypotheses, acts, and verifies results.
Design the tools and tool-shaped feedback the agent uses to probe and exploit, and the structured-output and validation layers that keep it reliable (e.g., hook-enforced mandatory validation, schema-constrained outputs).
Translate the team's offensive expertise into repeatable agent capabilities — partnering directly with our attackers to encode how they think into something the agent can do consistently.
Own and grow our evaluation infrastructure: benchmark suites, a failure-mode taxonomy across the pipeline (discovery → hypothesis → exploitation → verification), and regression detection, so we actually know whether the agent is getting better.
Manage LLM inference in production: model selection, prompt and context engineering, and keeping cost and latency under control (we run on AWS Bedrock with centralized cost tracking).
Hold the line on production-safety and no-false-positives, every finding the agent reports has to be real and reproducible.
Requirements
~2 min read5+ years building production software, with strong Python.
Hands-on experience building LLM-powered applications or agents, tool use / function calling, structured outputs, multi-step orchestration, and the glue that makes it all hold together.
A track record of making LLMs reliable in production, you've wrestled nondeterminism, designed around model limitations, and shipped something that worked when it mattered.
Real experience with evaluation: you've built or owned the harness that tells you whether a model or agent change is an improvement, not just a vibe.
Strong instincts for prompt and context engineering, and the judgment to keep the model's job small and well-scoped.
Solid software fundamentals — testing, observability, and the discipline to keep a complex agent debuggable.
Ownership mentality, comfortable owning a critical, fast-moving subsystem end to end.
Horizon3 is not just an equal opportunity employer - we are a community that values diversity, equity, and inclusion as fundamental principles of our culture and success. We are dedicated to fostering a workplace where everyone feels welcome and respected, regardless of race, color, religion, sex, national origin, age, disability, veteran status, sexual orientation, gender identity or expression, genetic information, marital status, hair length or any other legally protected status by law.
Our commitment to diversity and inclusion means we strive to attract, develop, and retain a workforce that reflects the varied communities we serve. We believe that diverse perspectives drive innovation and strengthen our ability to create cutting-edge cybersecurity solutions. At Horizon3, every team member is valued and supported in an environment that encourages personal and professional growth.
We welcome candidates from all backgrounds and experiences, and we encourage all qualified individuals to apply. Come be a part of Horizon3, where your unique contributions are recognized, and your potential is limitless.
Nice to Have
~1 min readWorking knowledge of web application security, broken access control, IDOR/BOLA, SQLi, XSS, SSRF, SSTI, enough to collaborate fluently with offensive engineers.
Experience building eval harnesses or benchmarks specifically for agents (synthetic environments, CVE-based test targets, capture-the-flag-style scoring).
Experience with agent frameworks, and strong opinions about when not to reach for one.
Familiarity with graph data models (e.g., Neo4j) for representing application state and attack context.
You've shipped an autonomous agent that did real, valuable work unattended in production, and you have scar tissue from making it trustworthy.
You've designed evaluation systems that actually drove improvement, closed the loop between "we changed something" and "it measurably got better."
You pair an offensive-security mindset (CTF, bug bounty, pentesting, or research background) with the engineering chops to turn that intuition into a reliable system.
You have hands-on experience with agent fine-tuning or RL (SFT, GRPO, reward design for tool-using agents) and a grounded view of when it's worth it versus improving the harness.
You've published or spoken on agent reliability, evaluation, or autonomous security tooling.
What We Offer
~1 min readPlease note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Duties, responsibilities, and activities may change at any time with or without notice.
In any materials you submit, you may redact or remove age-identifying information such as age, date of birth, or dates of school attendance or graduation. You will not be penalized for redacting or removing this information.
Location & Eligibility
Listing Details
- Posted
- June 23, 2026
- First seen
- June 23, 2026
- Last seen
- June 23, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 61%
- Scored at
- June 23, 2026
Signal breakdown
Please let horizon3ai know you found this job on Jobera.
3 other jobs at horizon3ai
View all →Explore open roles at horizon3ai.
Similar Software Engineer jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.