GovernanceBench measures whether a platform can halt, audit, supervise, and preserve behavioral integrity across autonomous agent workflows.
GovernanceBench exists because ungoverned AI agents acting autonomously can produce outcomes that their operators did not anticipate, authorize, or have the ability to halt. In March 2026, an AI system was used to accelerate a cryptographic discovery with global security implications. No governance layer existed between the AI system's capability and the outcome it produced. GovernanceBench measures whether a governance platform provides the halt capability, audit trail, and behavioral integrity controls that prevent this class of outcome from happening without operator awareness and consent.
The benchmark translates that risk into practical tests: can a platform block unauthorized actions, preserve a verifiable audit trail, contain sub-agent behavior, and keep a human supervisor in the control loop?
Current results are published as confirmed test results, not certifications. 100/100 across 5 dimensions. 211 of 211 scenarios passed, 0 failed, 0 skipped.
Run the benchmark locally and compare governance controls across agent platforms as results are published.
Apache 2.0 licensed open standard. Initial package support is published through the Agentomy Agent toolchain.
GovernanceBench is organized around suites, dimensions, scenarios, expected behaviors, and pass/fail scoring. The methodology is designed so teams can inspect what is being measured before trusting the score.
Methodology documentation available on request.Five dimensions. 211 scenarios total. Apache 2.0 licensed open standard.
GovernanceBench is built to support platform comparison as public scores become available. No competitor results are claimed until they are published through the same methodology.
| Platform | Suite 1 | Suite 2 | Suite 3 | Suite 4 | Suite 5 | Overall | Status |
|---|---|---|---|---|---|---|---|
| Agentomy | 100% | 100% | 100% | 100% | 100% | 100/100 | Published result |
| Other platforms | Pending publication | Pending publication | Pending publication | Pending publication | Pending publication | Pending | No public score yet |
Agentomy publishes the benchmark because governance should be measurable before autonomous agents become operational infrastructure.