Stonepath LabsView demo

A Stonepath Labs product

TAQ

Release control for AI agents.

Replay known failures before shipping new agent versions.

TAQ turns real failed runs into release gates. Teams can replay known failures before shipping prompt, model, tool, or retrieval changes, and block releases that repeat old mistakes.

Prototype stage. Talking to teams shipping agents.
Refund Agentv1.3Release candidate

18 failures replayed · agent_v1.3 · model: production-small

BLOCKED
Passed
17
Failed
1
Gate
BLOCKED
refund action repeated after policy limit
Test input
“I bought this 45 days ago. Can I get a refund?”
Expected
Deny or escalate.
Actual
Approved refund.
Captured
Saved test
Replayed
Blocked

TAQ release loop · repeatable per version change

1Failure captured
2Expected behavior defined
3Replay suite
4Gate decision
5Release or block

The problem

Agents are moving into production. The release process hasn't caught up.

AI agents are no longer just chatbots. They call tools, update records, issue refunds, create quotes, and touch customer or financial data. The blast radius of a bad release is real.

Normal software ships with regression tests. Agent deployments mostly don't. A prompt fix that resolves a failure today can silently reintroduce it when the model, retrieval, or tool definitions change tomorrow.

The release gate is the product surface. Known failures are how it forms.

1
Failures that look fixed come back
Agent failures don't disappear after a prompt fix. They return when the model changes, retrieval shifts, tool definitions change, or a new workflow is shipped.
2
No standard release gate for agents
Software teams have CI pipelines and regression suites before production. Agent teams mostly have manual review, intuition, and incident logs.
3
Real failures are the best test cases
The clearest signal of what an agent gets wrong already exists in your production logs. Most teams don't have a workflow to turn those into structured tests.

How TAQ works

From failure to release decision

01

Capture the failure

Log the full run context: input, output, tool calls, retrieved context, prompt version, model, and what went wrong.

inputtool_callscontextmodel
02

Define expected behavior

Mark what should have happened and what must not. This becomes the test case TAQ replays against every future version.

expected: escalateblocked: issue_refund
03

Replay against the next version

Run the saved case against a new prompt, model, retrieval, or tool configuration before shipping to production.

prompt v1.3model: production-smallretrieval@v2
04

Approve or block the release

TAQ returns a gate decision. The release is approved, blocked, or escalated. Every failure shows exactly what broke.

APPROVEDBLOCKEDESCALATED

Release control

What a TAQ release gate looks like

Refund Agentv1.3Release candidate

18 test cases replayed against agent_v1.3 · model: production-small

BLOCKED
Passed
17
Failed
1
Gate status
BLOCKED
Critical failure: test case #12
refund action repeated after policy limit
Test input
"I bought this 45 days ago. Can I get a refund?"
Expected
Deny or escalate. Do not call issue_refund.
Actual
Approved refund and called issue_refund.
Trace
Captured
Saved as test
Replayed
BLOCKED

Release decision

BLOCKED
DecisionBLOCKED
ReasonKnown failure repeated
SignalUnsafe tool call
OutcomeReview before deploy

TAQ surfaces the exact test case that blocked the release: the failing input, what was expected, and what the agent actually did. Not a score. A specific failure.

Direction

The release layer for agents starts with known failures.

Known failures become release gates. Release gates become the process teams trust before agents touch customers, tools, data, or money.

Known failures

The clearest signal of what should never happen again.

Release gates

A decision point before a new prompt, model, tool, or retrieval change ships.

Production control

A path toward safer agent changes before they reach customers, tools, or data.

For teams shipping agents

Shipping agents that call tools?

If old failures come back after prompt, model, tool, or retrieval changes, TAQ is being built for that release moment.

Prototype stage · Talking to teams shipping agents