Arrow to go next
All posts

Introducing RegulatoryAgentBench — a benchmark for AI agents operating under the law

RegulatoryAgentBench

We are open-sourcing RegulatoryAgentBench (v1), the first benchmark purpose-built to measure whether AI agents can correctly detect and respond to regulatory change. This is the foundation for a new class of infrastructure we believe is inevitable

The problem

As AI agents take on real workflows in financial services, healthcare, and legal operations, a critical gap is emerging. These agents operate inside regulatory environments that shift constantly — new rules, amended obligations, revised deadlines — and they have no mechanism to detect when the ground moves beneath them.

The intelligence layer that keeps AI agents within the law does not yet exist. We are building it.

What we've built

At Carver, we monitor 1000+ regulators across 50 countries, structuring every regulatory update with dozens of classification attributes: what changed, who is affected, what action is required, and by when. The data layer exists. The agent layer is what comes next.

RegulatoryAgentBench is the measurement tool that makes building that agent layer possible. Not a test of whether a model can read a regulation — but whether it can respond to one correctly.

We ran it across:

  • 3 frontier models evaluated
  • 50 real regulatory scenarios
  • 30+ regulators across APAC, EMEA, LATAM, and North America

The results reveal exactly where current agents succeed — and where they fall short. Those gaps are our roadmap.

What comes next

We see this as the first step toward a specialist class of regulatory agents — a dedicated intelligence layer whose role is to ensure that other AI agents operating in regulated industries remain within the law as rules evolve. RegulatoryAgentBench v1 is how we start measuring our way there.

If you are building agents in regulated industries — or thinking about where the compliance layer for AI needs to go — we would welcome the conversation. The benchmark is open and available now.