Stonepath LabsView demo

DOCS

Start with one failed run.

Use replayd to turn a failed agent or workflow run into a replayable fixture. Run it again before prompt, model, tool, or workflow changes ship.

Open-source capture layer·replayd
replayd runBLOCK
case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_only
route_taken == send_product_image
actual: ai_agent_text_only
FAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only

release_decision: BLOCK

01

What replayd does

replayd helps model agent failures as repeatable cases. A case stores the input, the expected behavior, and the checks that decide whether the failure came back.

It does not need to judge everything. Start with the failure you already know.

02

Quickstart

Step 1

Install

The honest starting path is the GitHub repo.

git clone https://github.com/TaimoorKhan10/replayd
Step 2

Create a replay case

case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_only
Step 3

Add assertions

route_taken == send_product_image
final_output.type == media
image_url exists
Step 4

Run the replay

replayd run commerce_image_request_001
Step 5

Read the decision

FAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only

release_decision: BLOCK

03

When to use replayd

after a customer-facing agent fails
before changing a prompt
before switching model versions
before changing retrieval
before editing workflow routing
before shipping a new tool call path

04

What a replay case can check

routes
tool calls
output shape
retrieved context
policies
semantic behavior

05

From replayd to TAQ

replayd is the open-source core for modeling and replaying known failures. TAQ is the release-gate layer we are building around that loop: shared replay suites, release decisions, approvals, history, and team workflows.

That is the direction. The starting point is still simple: one failed run becomes one replay case.

Have one failed run?

Send it to us. We will help turn it into a replay case.