release candidate entering gate

TAQ · RELEASE CONTROL FOR AI AGENTS

Commerce Agent v1.3release gate

TAQ

Turn failed AI agent runs into release gates.

Save real agent failures as regression tests, replay them before prompt, model, tool, or workflow changes ship, and block releases that repeat old mistakes.

View demo View GitHub

Open-source capture layer·replayd

Commerce Agentv1.3Release candidate

18 known failures replaying

Checking release candidate

release gatelive

Passed

Failed

Gate

BLOCKED

Critical failure

image request routed to text response

Input

"Can I see a picture?"

Expected

send_product_image

Actual

ai_agent_text_only

Trace

Captured

Saved as test

Replayed

Blocked

DecisionBLOCKED

release stopped before deploy

TAQ RELEASE LOOP · REPEATABLE PER AGENT VERSION

every failed run gets a path to the gate

1failed run

real behavior captured

2saved regression test

failure becomes a case

3replay before release

known failures replay

4approve/block

gate makes the call

The tension

Agents are moving into production. Their release process has not caught up.

AI agents no longer just answer questions. They call tools, update records, issue refunds, retrieve documents, route workflows, and touch customer or financial data.

Normal software ships with regression tests. Agent deployments often ship with manual checks, screenshots, eval spreadsheets, intuition, and incident memory.

A prompt fix that solves one failure today can silently reintroduce another failure tomorrow.

The gate starts with the failures a team already knows.

incident memory

old failures waiting for a gate

repeated case

failure returned after model change

Agent failures do not disappear after one prompt fix. They return when models change, retrieval shifts, tool definitions change, or new workflow logic ships.

no standard release gate

Software teams have CI pipelines and regression suites. Agent teams often rely on manual review, intuition, and production incidents.

real failures are the test cases

The clearest signal of what an agent must not do already exists in failed runs, logs, tool traces, and user complaints.

Text evals are not enough

The model can sound right while the system does the wrong thing.

A customer asks to see a product image. The agent replies with a reasonable text description. But the workflow should have sent an image.

A text evaluator might pass the answer. A release gate should block the behavior.

TAQ checks the system action, not just the language.

intent · route · tool call · output payload · policy decision

Text eval

Checks whether the answer sounds correct.

Example

Product description is relevant and fluent.

Result

May pass.

TAQ release gate

Checks whether the workflow did the right thing.

Example

Expected route: send_product_image

Actual route: ai_agent_text_only

Result

BLOCKED

WHAT TAQ CHECKS

TAQ checks behavior, not just answers.

Replay cases can assert the route taken, tool called, output shape, retrieved context, policy decision, or semantic expectation.

Select assertion surface

Behavior inspectorroutes

commerce_image_request_001 · replay case

BLOCKED

Failed old case

image request routed to text response

Gate verdict

Release decisionBLOCKED

Expected

send_product_image

Actual

ai_agent_text_only

replay pathcaptured → asserted → gated

failed runreplay caseblocked

OPEN-SOURCE CORE

replayd turns failed runs into replayable fixtures.

Use replayd to capture the failure, define what should have happened, and replay it before future prompt, model, tool, retrieval, or workflow changes ship.

View GitHub View fixture

pip install replayd

replaydtest runner

replayd run commerce_image_request_001

FAIL

FixtureReplay outputGate decision

case_idcommerce_image_request_001

input"Can I see a picture?"

expected_routesend_product_image

actual_routeai_agent_text_only

Assertions

route_taken == send_product_image

final_output.type == media

image_url exists

Replay output

FAILtrace.router.route_taken

expectedsend_product_image

actualai_agent_text_only

Gate footer

release_decisionBLOCK

The failed route becomes a replayable case. TAQ can use the same decision path as a release gate.

HOW TAQ WORKS

From failure to release decision.

The same known failure moves through capture, expectation, replay, and a release gate. Each stage removes ambiguity from the next ship.

failed run

Capture the failure

Log the full run context: input, output, tool calls, retrieved context, prompt version, model, and what went wrong.

Stage artifact

inputtool_callscontextmodel

replay fixture

Define expected behavior

Mark what should have happened and what must not happen again.

Stage artifact

expected: send_product_imageblocked: ai_agent_text_only

replay suite

Replay against the next version

Run the saved case against a new prompt, model, retrieval, or tool configuration before production.

Stage artifact

prompt v1.3model: production-smallretrieval@v2

release gate

Approve or block the release

TAQ returns a gate decision. Every failure shows exactly what broke.

Stage artifact

APPROVEDBLOCKEDESCALATED

RELEASE GATE

Watch TAQ block a repeated agent failure.

A release candidate enters the gate. Known failures replay. If one repeats, TAQ blocks the release before it reaches users, data, tools, or money.

release evaluation running

release candidategate boundary

Commerce Agentv1.3Release candidate

18 known failures replayed

Checking

replay railevaluating

Passed

Failed

Gate

BLOCKED

Critical failure

image request routed to text response

Input

"Can I see a picture?"

Expected

send_product_image

Actual

ai_agent_text_only

Trace

Captured

Saved as test

Replayed

Blocked

Release decision

DecisionBLOCKED

ReasonKnown failure repeated

OutcomeReview before deploy

The old failure is caught before release. The candidate does not ship until the behavior is reviewed.

Have a failure this should catch?

Send a failed run View GitHub

PRICING

Start locally with replayd. Add TAQ release gates when agents touch users, tools, data, or money.

Developer

Free / Open source

Start with replayd locally.

Start locally with replayd. Capture failed runs and turn them into replayable fixtures in your repo.

local replay fixtures
deterministic assertions
tool, route, and output checks
commit cases to your repo

View GitHub

serious team path

Team Beta

Private beta

Release gates for real agents.

For teams shipping AI agents or client automations that need replay suites and release-blocking decisions before changes go live.

shared replay suites
release-blocking assertions
prompt/model/tool regression checks
workflow review before deploy
early product support

Send a failed run

Enterprise

Future / Contact

Future governance and control.

For organizations that need governance around agents touching customers, data, tools, or money.

approvals and audit trails
team permissions
release history
compliance-ready reporting
policy gates

CONTACT

Show us one failure you never want users to see again.

Send us one real failure: what the user asked, what the agent or workflow did, and what should have happened instead. We'll help turn it into a replay test.

failed runreplay caserelease gate

hello@stonepathlab.net

Send a failed run View demo