auditable

Audit any agent decision across its past, present, and future, on one typed graph.

auditable is an open-source system of record for AI-agent decisions. Your logs show what an agent did; auditable shows what each decision relied on, replays that decision against the state that is live now, and rolls back the committed action when it no longer holds. Every agent run becomes one typed graph with two layers, execution (what ran, in what order) and dependency (what each step relied on), read at three points in the agent's life: before deploy, while running, and after a run.

This project is the open-source thesis of our Agent Layer work on agent risk audit and runtime control. The core is framework-agnostic and dependency-free, and is built on the two-layer graph model introduced in our GRADE paper.

If this project is useful in your workflow, please star the GitHub repository to help more practitioners discover it.

Why this project

Agents act on dependency state that quietly drifts. A budget read minutes ago can fall below an amount already committed; a price pinned at plan time can move before the action lands. Most tools log what happened, but they cannot re-decide under the state that is live now, so a stale decision stands until a human notices. auditable closes that recovery gap on one graph: it captures the decision, replays it against live state, and reverses the committed action when it no longer holds.

The three pillars

The same detection-and-report pass runs over one typed decision graph at three points in an agent's life. The graph kernel stays constant; only the pillar changes.

PRE (before deploy, analyze_plan): read-only structural lints on a declared plan, naming the control-flow chokepoint the plan rests on.
LIVE (while running, audit + replay + ActionGate): capture a decision, re-decide under live state, and route a fix (allow, block, review, or roll back) through a rail.
POST (after a run, analyze_run): rank a finished run by structural blast share and name the keystone step to review first.

The flagship moment

One payment, walked through the whole lifecycle. An agent approves a $2,083.20 vendor payment against a budget snapshot that covered it. Six days later the live budget has dropped below the amount. replay re-decides on the live state, and the gate reverses the committed payment. This is recovery, not a log line.

Note: the shipped ReferenceLedger is an in-process reference rail for demos and tests; production use needs a real compensation rail behind the same gate interface.

Proven on public agent benchmarks

The graph is not just structure. GRADE (arXiv:2606.22741) evaluates six public agent corpora for run-failure prediction. On SWE-Gym, adding the dependency layer lifts within-corpus ROC-AUC from a 0.663 run-size baseline to 0.805. Separately, the execution layer localizes the faulting step at Top-3 0.614 on the Who&When benchmark.

Quick start

Install from PyPI:

pip install auditable

The core is dependency-free and torch-free. Structural-graph analysis (analyze_plan for PRE, analyze_run for POST) needs the optional graph extra:

pip install "auditable[graph]"

Built on GRADE

auditable productizes the typed two-layer graph model introduced in GRADE: Graph Representation of LLM Agent Dependency and Execution. If you use auditable in your work, please cite GRADE:

@article{zhao2026grade,
  title   = {GRADE: Graph Representation of LLM Agent Dependency and Execution},
  author  = {Zhao, Yue},
  journal = {arXiv preprint arXiv:2606.22741},
  year    = {2026}
}