DOCS
Start with one failed run.
Use replayd to turn a failed agent or workflow run into a replayable fixture. Run it again before prompt, model, tool, or workflow changes ship.
case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_onlyFAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only
release_decision: BLOCK01
What replayd does
replayd helps model agent failures as repeatable cases. A case stores the input, the expected behavior, and the checks that decide whether the failure came back.
It does not need to judge everything. Start with the failure you already know.
02
Quickstart
Install
The honest starting path is the GitHub repo.
git clone https://github.com/TaimoorKhan10/replaydCreate a replay case
case_id: commerce_image_request_001
input: "Can I see a picture?"
expected_route: send_product_image
actual_route: ai_agent_text_onlyAdd assertions
route_taken == send_product_image
final_output.type == media
image_url existsRun the replay
replayd run commerce_image_request_001Read the decision
FAIL trace.router.route_taken
expected: send_product_image
actual: ai_agent_text_only
release_decision: BLOCK03
When to use replayd
04
What a replay case can check
05
From replayd to TAQ
replayd is the open-source core for modeling and replaying known failures. TAQ is the release-gate layer we are building around that loop: shared replay suites, release decisions, approvals, history, and team workflows.
That is the direction. The starting point is still simple: one failed run becomes one replay case.
Have one failed run?