Agent compliance
The practice of proving that the AI agents operating inside an organization stay within the controls a framework requires. The proof is a runtime record: what each agent saw, what it produced, and what it did, on whose authority and against which system.
An AI agent does two kinds of compliance-relevant work, and each carries its own obligations. The first is the conversation: the data the agent is given and the output it produces. Data-handling, confidentiality, and retention obligations attach here, the moment information reaches the model, whether or not the agent goes on to act. The second is the action: the tool calls the agent makes to read a record, write a row, or invoke an API, under whatever authority it held at the start of the session. The obligations that govern changing a system of record attach here. Both happen at machine speed, across systems that were never built to report what an agent saw or did to a compliance team.
So when an auditor asks how the organization governs its AI, they are asking two things: where is the record, and what does it contain? The answer changes with the vendor and the license tier. A managed Claude or ChatGPT Enterprise tenant exposes a compliance API that can export the chat record. A developer running Cursor or Copilot produces almost no exportable record of what the model generated. An employee on a personal ChatGPT Plus or Claude Pro subscription sits outside all of it. The sections below cover what compliance requires, the three gaps that block a complete record today, and how to close them.
What does agent compliance require?
ISO/IEC 42001, the first certifiable standard for managing AI, is mostly a management system: policies, risk processes, and accountability an organization documents before anything ships. A handful of its Annex A controls are different. They describe what has to be true while the AI runs, and they resolve into evidence only at runtime.
The runtime controls, and the evidence they need
These controls share one hard property. The evidence is a byproduct of the system running, and it exists only if something recorded it as it happened. For an agent that means capturing both surfaces, the conversation and the action, and whether either one gets recorded is decided by the tool the agent runs inside. Most tools capture at most one.
The three gaps that block agent compliance
The evidence these controls need is capturable in principle. In practice, three gaps stand between an organization and a complete record:
- The compliance APIs that agent vendors ship are new and uneven
- A lot of AI usage runs on individual licenses the organization never sees
- GRC platforms that owns the audit cannot collect any of it.
Vendor compliance APIs are new and uneven
The major vendors fall into two camps, and neither one is complete.
The enterprise chat products expose a compliance API that hands the conversation record to downstream tooling. Anthropic’s Compliance API is the most complete of the set. Running under https://api.anthropic.com/v1/compliance/*, it exposes an activity feed retained for six years, the underlying chat, file, and project content for claude.ai organizations, the directory of users and roles, and the effective settings for each linked organization. Content endpoints support both retrieve and delete, so a legal team can pull a conversation or honor a deletion request programmatically. Access is gated to Claude Enterprise; Team plans get a narrower CSV audit-log export, and Pro and Free get neither.
OpenAI’s Compliance Platform, launched in July 2024, exposes time-stamped interactions as immutable log files: conversations including prompt and response text, uploaded files, custom GPT configuration and metadata, memories, and workspace users. Eight eDiscovery and DLP vendors built integrations at launch, among them Microsoft Purview, Relativity, Smarsh, Netskope, and Zscaler. It is available on ChatGPT Enterprise and Edu only.
The coding tools log the administrative shell and leave the content out. Cursor offers audit logs and an Admin API on its Enterprise plan, but both carry administrative and usage metadata only, and the documentation is explicit that “we do not log agent responses or generated code content.” GitHub Copilot is the same shape. Its audit log captures seat assignment, policy changes, and configuration, and the docs state plainly that it “does not include client session data, such as the prompts a user sends to Copilot locally.” For the tools where an agent reads your codebase and writes changes, there is no native export of what it generated.
What each tool can hand a downstream system
Two limits hold even at the Enterprise tier
Even at the top tier, two limits remain. Each API covers a single vendor’s conversation surface in a single vendor’s format, so a company running Claude, ChatGPT, Cursor, and Copilot stitches together separate exports and still gets no content from the coding tools that log none. And none of them reaches the action surface: the tool call an agent makes into Salesforce, a database, or an internal API resolves against your infrastructure rather than the vendor’s tenant, which is where the compliance-relevant action actually happens.
Individual licenses sit outside every agent platforms control
Everything above assumes the AI came through a door the organization controls. Much of it does not. When an employee uses a personal ChatGPT Plus subscription, a Claude Pro account, or Cursor on the free tier, none of the enterprise machinery applies: no admin console, no compliance API, no audit log, no retention control, no SSO. The compliance API bought at the Enterprise tier sees nothing of it, because the activity belongs to a tenant the company does not own. This is the core of shadow AI, and it is the hardest part for compliance: the evidence an organization can produce is bounded by the licenses it centrally manages, and everything bought on a personal card is dark.
Same vendor, two doors
GRC platforms never see AI usage
Teams often assume their compliance platform already has this covered. It does not. The vendor compliance APIs above export to eDiscovery and data-loss-prevention archives. They do not directly export to Vanta or Drata.
The AI-vendor connectors that GRC platforms do ship are identity connectors. Vanta’s OpenAI integration and Drata’s OpenAI and Anthropic integrations sync the user roster and roles so that AI accounts can be folded into access reviews and deprovisioning. They pull who can log in. They pull nothing about how the AI is used: no conversations, no tool calls, no model configuration, no AI-specific audit trail.
The diagram below traces where each surface of agent activity actually lands. Read it as three rows: what gets generated on the left, the path it travels in the middle, and where it comes to rest on the right.
Closing the gaps with an AI control plane
Closing these gaps means recording agent activity at the boundaries the organization owns, rather than inside each vendor’s tenant. An AI control plane is the governing layer that does this: it sits between every AI agent in an organization and every system it can reach, and keeps the record the vendor APIs and GRC connectors leave behind, in a form built to export into the GRC platform itself. Three capabilities map onto the three gaps above.
Each gap, and the capability that closes it
Cross-agent capture in one AI audit log
The vendor APIs each cover a single vendor’s conversation surface in a single vendor’s format, and the coding tools export nothing at all. A control plane records every agent the same way, so Claude, ChatGPT, Cursor, Copilot, and whatever ships next produce one uniform, queryable log instead of four partial exports stitched together by hand. Because the record is generated on the path rather than inside a vendor’s tenant, it captures both surfaces: the conversation, what the model saw and produced, and the action, what each agent did, with what arguments, against which system, under whose identity, and what came back. That is the same record ISO 42001 A.6.2.8 asks for, in one shape regardless of which model or tool generated it.
A device agent that captures shadow AI on any license
The hole individual licenses open is that the activity belongs to a tenant the company does not own, so no admin console or compliance API can reach it. A device agent installed on the endpoint closes it. It captures AI usage at the machine, which means a personal ChatGPT Plus or Claude Pro account is recorded the same as a managed enterprise seat, and the evidence an organization can produce stops being bounded by the licenses it centrally manages. Delivered through the MDM the fleet already runs, the agent is in place before the employee opens the laptop.
How Speakeasy’s AI control plane fits
Speakeasy is building the AI control plane. The MCP gateway routes and governs every agent-to-tool connection, enforcing authentication and access policy server-side rather than trusting each laptop. Agent hooks instrument every tool invocation in a signed, append-only log: who called what, with what data, and what happened, across managed and unmanaged tools alike, in a form built to survive forensic review and to export into the GRC platform that owns the rest of the program.
Speakeasy does not get a company certified, write its policies, or run its audit. It produces the runtime enforcement and the operational evidence that ISO 42001, SOC 2, and the EU AI Act require for agents, the part the compliance platform cannot auto-collect and the vendor APIs only partly reach. For a platform or security team mapping how AI flows through the organization before an auditor or a regulator asks, the Speakeasy AI control plane is where that record begins.
Further reading
- What is AI governance?: the four functions an enterprise has to control, and how they fit together.
- What is shadow AI?: the unmanaged-license problem and how to detect it.
- The EU AI Act and the AI control plane: how the Act turns record-keeping and oversight into legal obligations.
- AI agent hooks: the primitive that captures the per-tool-call record at the agent itself.