DOCS

Start with one failed run.

Use replayd to turn a failed agent or workflow run into a replayable fixture. Run it again before prompt, model, tool, or workflow changes ship.

View GitHub Jump to quickstart

Open-source capture layer·replayd

replayd runBLOCK

case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_only

route_taken == send_product_image

actual: ai_agent_text_only

FAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only

release_decision: BLOCK

What replayd does

replayd helps model agent failures as repeatable cases. A case stores the input, the expected behavior, and the checks that decide whether the failure came back.

It does not need to judge everything. Start with the failure you already know.

Quickstart

Step 1

Install

The honest starting path is the GitHub repo.

git clone https://github.com/TaimoorKhan10/replayd

Step 2

Create a replay case

case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_only

Step 3

Add assertions

route_taken == send_product_image
final_output.type == media
image_url exists

Step 4

Run the replay

replayd run commerce_image_request_001

Step 5

Read the decision

FAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only

release_decision: BLOCK

When to use replayd

after a customer-facing agent fails

before changing a prompt

before switching model versions

before changing retrieval

before editing workflow routing

before shipping a new tool call path

What a replay case can check

routes

tool calls

output shape

retrieved context

policies

semantic behavior

From replayd to TAQ

replayd is the open-source core for modeling and replaying known failures. TAQ is the release-gate layer we are building around that loop: shared replay suites, release decisions, approvals, history, and team workflows.

That is the direction. The starting point is still simple: one failed run becomes one replay case.

Have one failed run?

Send it to us. We will help turn it into a replay case.

Send a failed run View GitHub