A Stonepath Labs product
TAQ
Release control for AI agents.
Replay known failures before shipping new agent versions.
TAQ turns real failed runs into release gates. Teams can replay known failures before shipping prompt, model, tool, or retrieval changes, and block releases that repeat old mistakes.
18 failures replayed · agent_v1.3 · model: production-small
TAQ release loop · repeatable per version change
The problem
Agents are moving into production. The release process hasn't caught up.
AI agents are no longer just chatbots. They call tools, update records, issue refunds, create quotes, and touch customer or financial data. The blast radius of a bad release is real.
Normal software ships with regression tests. Agent deployments mostly don't. A prompt fix that resolves a failure today can silently reintroduce it when the model, retrieval, or tool definitions change tomorrow.
The release gate is the product surface. Known failures are how it forms.
How TAQ works
From failure to release decision
Capture the failure
Log the full run context: input, output, tool calls, retrieved context, prompt version, model, and what went wrong.
Define expected behavior
Mark what should have happened and what must not. This becomes the test case TAQ replays against every future version.
Replay against the next version
Run the saved case against a new prompt, model, retrieval, or tool configuration before shipping to production.
Approve or block the release
TAQ returns a gate decision. The release is approved, blocked, or escalated. Every failure shows exactly what broke.
Release control
What a TAQ release gate looks like
18 test cases replayed against agent_v1.3 · model: production-small
Release decision
TAQ surfaces the exact test case that blocked the release: the failing input, what was expected, and what the agent actually did. Not a score. A specific failure.
Direction
The release layer for agents starts with known failures.
Known failures become release gates. Release gates become the process teams trust before agents touch customers, tools, data, or money.
Known failures
The clearest signal of what should never happen again.
Release gates
A decision point before a new prompt, model, tool, or retrieval change ships.
Production control
A path toward safer agent changes before they reach customers, tools, or data.
For teams shipping agents
Shipping agents that call tools?
If old failures come back after prompt, model, tool, or retrieval changes, TAQ is being built for that release moment.
Prototype stage · Talking to teams shipping agents