Open Specification

Agentomy Agent Governance Protocol

AGP defines what enterprise AI agent governance requires. Three tiers. Each tier maps directly to a specific EU AI Act article obligation. Any execution layer -- any AI agent framework, any on-device model, any cloud agent -- can implement this specification.

Three Tiers

Tier 1: Record

Tamper-evident audit trail

Every AI agent action is logged with attribution, timestamp, and hash-chain integrity. The record survives the agent's own attempts to modify history. Retained for the system's operational lifetime plus the applicable limitation period for legal claims.

EU AI Act Article 12: Record Keeping and Logging

Tier 2: Enforce

Permission enforcement before every action

Every AI agent action is evaluated against a defined permission boundary before execution. Four boundary categories: data access, tool access, external communication, and cross-agent delegation. Every agent action is checked against authorized scope before execution begins.

EU AI Act Article 9: Risk Management System

Tier 3: Override

One human decision stops everything

One authorized human decision halts all autonomous agent action immediately across every deployment surface -- cloud, mobile, and edge. The halt produces a verifiable signed proof record. No graceful shutdown negotiation. Hard stop.

EU AI Act Article 14: Human Oversight

The Operator Obligation

Enterprise AI vendors satisfy their own processor obligations through their product's governance features. The EU AI Act places separate obligations on the operator -- the organization deploying AI agents. AGP defines what the operator's governance infrastructure must include, independent of which vendor's AI agents are deployed.

Implementation

AGP is vendor-neutral. Any execution layer can implement the three tiers. The specification defines the minimum. What follows demonstrates what a complete implementation looks like.

For integration guidance, use the Request Access form.

Reference Implementation: Agentomy

AGP defines the minimum. Agentomy demonstrates what a complete implementation looks like. The three tiers remain the specification floor. Everything below exceeds it.

10-Stage Governance Pipeline

Every action passes through 10 sequential stages

1. Identity -- resolves agent identity via dual-module resolution.
2. Trust -- validates trustworthiness across four modules (trust authority, chain validator, trust scorer, security policy).
3. Authorization -- tier-based permission check with inter-agent delegation support.
4. Behavioral Check -- baseline deviation detection and execution risk assessment.
5. Content Input Scan -- inbound payload threat scanning (injection, integrity, content filtering).
6. Action Execution -- controlled execution with evidence recording.
7. Output Scan -- IP leak scrubbing, compliance claim verification, content scanning.
8. Evidence Recording -- persistence to audit trail, trust ledger, and chain-of-custody ledger.
9. Drift Update -- behavioral baseline recalculation with threshold-based flagging.
10. Halt Evaluation -- final aggregation of all stage findings; triggers halt if any stage returned critical.

Each stage produces a SHA-256 hash-linked finding. The first stage hashes against 64 zeros. Each subsequent stage chains to the previous hash, forming a tamper-evident forensic evidence chain. The final evidence chain hash is returned in every authorization response and is independently verifiable.

GovernanceBench: 5 Dimensions

Open benchmark measuring governance completeness

Authorization -- tier-based permissions enforced server-side; escalation via request body impossible.
Auditability -- every event recorded, hash-linked, exportable, tamper-evident.
Override Capability -- authorized operator halts all agents immediately; unauthorized halt blocked.
Behavioral Integrity -- anomalous behavior detected, flagged, and quarantined automatically.
OWASP Agentic Coverage -- adversarial threat detection across injection, supply chain, model integrity, and protocol-level attacks.

Agentomy result: 100/100. 224 scenarios across 5 dimensions. Zero failures. Any governance platform can run GovernanceBench against itself: npx governancebench run --target http://your-platform:3000

31 Behavioral Detection Methods

Domain-specific anomaly detection across 6 verticals

9 core methods (standalone) -- baseline deviation, execution risk, frequency anomaly, scope violation, identity drift, evidence gap. Available in every deployment.
5 RPA-specific -- tier escalation, external destination, config drift, lateral movement, unattended spawning.
5 algorithmic trading -- order velocity, position concentration, strategy drift, spoofing pattern, cross-venue exposure.
5 medical device -- dosage anomaly, calibration drift, protocol deviation, alert fatigue, device isolation breach.
5 AV fleet -- route deviation, sensor disagreement, communication dropout, geofence violation, fleet coordination failure.
5 industrial IoT -- setpoint drift, actuator conflict, network segmentation breach, safety interlock bypass, process variable excursion.

The 9 core methods operate standalone. The 25 vertical-specific methods require fleet infrastructure for cross-agent behavioral baselines. Auto-quarantine triggers on critical severity. Quarantined agents cannot execute further actions without explicit operator release.

Fleet Governance

Cross-agent coordination at scale

Cross-agent correlation matrix detects coordinated anomalies that appear benign at the individual agent level. Joint behavioral baselines identify fleet-wide drift before individual agents cross thresholds.

Standalone deployments support halt for up to 10 locally governed agents. With fleet infrastructure: sub-50ms coordinated halt across 500+ concurrent agents. One operator decision stops every governed agent across every deployment surface. Post-halt agent registration is rejected -- no new agents can join while halt is active. Fleet-level coordination requires Agentomy fleet infrastructure.

5-Tier Permission Model

Identity-based authorization with graduated action scope

Evaluator -- read-only observation. Zero operational footprint.
Analyst -- standard task execution within defined scope. No irreversible actions without confirmation.
Builder -- elevated operations requiring explicit authorization before execution.
Operator -- administrative operations affecting system configuration. Requires senior approval.
Strategist -- system-level actions including kill switch execution and governance policy override. Cannot be delegated.

Every tier assignment is logged. Tier escalation via request body is structurally impossible -- enforced at the authorization stage of the 10-stage pipeline, not at the application layer.

EU AI Act Alignment

Pipeline stages mapped to specific regulatory obligations

Article 9 (Risk Management) -- Stages 2 (Trust), 3 (Authorization), 4 (Behavioral Check): continuous risk identification and mitigation throughout the action lifecycle.
Article 12 (Record-Keeping) -- Stage 8 (Evidence Recording): tamper-evident logging with hash-chain integrity, retained for operational lifetime plus applicable limitation period.
Article 14 (Human Oversight) -- Stage 10 (Halt Evaluation) + fleet halt: one authorized human halts all autonomous action immediately with signed proof record.
Article 15 (Accuracy and Robustness) -- Stages 5 (Content Input), 7 (Output Scan), 9 (Drift Update): continuous validation of input integrity, output correctness, and behavioral stability.

The three-tier AGP specification maps Articles 9, 12, and 14 as the minimum. The reference implementation additionally satisfies Article 15 obligations through input/output scanning and drift monitoring.

Agentomy is in active development. This specification reflects the current implementation, not an independently assessed standard.