AI agent hook
A user-defined handler, whether a script, an HTTP endpoint, an MCP tool, or a small LLM prompt, that fires at a specific point in an AI agent’s lifecycle. Hooks see structured data about what is about to happen and can observe, log, modify, or block before execution proceeds. Claude Code, Cursor, Codex, and VS Code Copilot all expose them.
A year ago, the only place to inspect AI activity inside the enterprise was the network. Teams stood up LLM gateways like OpenRouter, Portkey, and LiteLLM, routed traffic through a proxy, and tried to reason about behavior from the outside. That model was already losing the battle when MCP arrived. The tool calls, MCP traffic, local shell commands, file reads and edits all run in the agent’s execution environment where the gateway cannot see them.
Fortunately, the major AI agents have converged on a new approach. Hooks are user-defined handlers built into the AI agent itself. They fire at specific points in the agent loop, and have access to the full context of a chat session. Hooks can observe, log, modify, or block model interactions before execution proceeds. Claude Code shipped them first. Cursor followed with a near-identical surface. Codex added them. VS Code Copilot now exposes them too. The shape of the primitive is settled enough that organizations can start treating it as the foundation for building an AI control plane to enable governance, rather than a per-vendor curiosity.
This guide walks through what hooks are, what they are useful for, how to wire them up, which events the major AI agents support, and how Speakeasy fits into the picture.
What are AI agent hooks
An AI agent hook is a user-defined handler that the agent invokes at a specific point in its lifecycle. The agent passes the hook structured JSON about what is about to happen: the prompt text, the tool name and arguments, the file path, the shell command. The hook does whatever it wants with that input and returns a JSON response that the agent acts on. Allow. Deny. Modify the input. Inject context. Continue silently and just log the event somewhere central.
The diagram below traces the resolution of a single PreToolUse hook when Claude Code attempts a Bash command. The lifecycle event fires unconditionally, then the user-defined matcher and if filters narrow it down. If both match, the hook command runs and decides the outcome. If either does not, the hook is skipped and the tool proceeds.
A few properties make this primitive interesting.
Hooks run inside the agent loop. They are not a network-layer interceptor. They sit between the agent’s decision to do something and the actual execution. A PreToolUse hook on Bash sees the command string before it runs, the same way it sees an MCP tool call’s arguments before the call goes out. There is no traffic to mirror, no proxy to install, no certificate to trust.
Hooks see the full context. The structured payload includes the session ID, the working directory, the model, the tool name, the tool input, and often the full transcript path. The data the security team has been trying to reconstruct from logs is sitting right there at the source.
Hooks can route anywhere. Most providers support multiple handler types. A hook can shell out to a script, POST to an HTTP endpoint, call a tool on a connected MCP server, or evaluate a small LLM prompt inline. The same primitive that runs a one-line grep for secrets can ship a structured event into a SIEM or a control plane.
Hooks compose. Multiple hooks can register against the same event. They run in parallel (or in sequence, depending on the provider), and the results combine. An organization can layer a fast regex secret-detector, a slower LLM-based PII classifier, and a telemetry handler against the same PreToolUse event without any one of them knowing about the others.
Hooks fail open by design. Every provider treats hook failures as non-fatal. If the script errors, the network blips, the LLM evaluator times out, or the credentials are missing, the agent proceeds. This is the non-negotiable property of any control that lives in the hot path. Visibility that breaks the agent gets removed.
The combination is what makes hooks the right place for AI governance. They are close enough to the action to see everything, expressive enough to enforce policy, and decoupled enough to ship as a configuration rather than a fork of the agent.
What problems do hooks solve
Hooks are a general-purpose primitive, but the use cases enterprises actually care about cluster into four buckets.
Real-time policy enforcement. Exactly what form this takes is specific to an organization. Common use cases include: block destructive shell commands before they run, deny tool calls that would write to a production database, strip secrets from prompts before they reach the model, and stop a coding agent from committing code with hardcoded credentials. The pattern is the same across all of these: a PreToolUse or UserPromptSubmit hook evaluates the input against a rule set and returns a deny decision when it matches. The action never happens.
Observability and audit. Capture every prompt, every tool call, every response into a central store. The data that was previously trapped on individual laptops becomes a unified feed that security, platform, and finance teams can all query. Token use can be broken down by team, user, or workflow to give a clear picture of how AI is being used internally. The same hook events that drive real-time blocking can also drive retroactive analysis. “Find every session last quarter where a customer record left the perimeter” becomes a query rather than a forensic investigation.
Code safety. Hooks fire on file edits and shell executions, which makes them the natural place to wire in code-quality and security tooling. Run Semgrep or Corridor against generated code on every afterFileEdit. Run Endor Labs or Snyk against package installs on every beforeShellExecution matching npm install or pip install. The agent gets immediate, structured feedback and can regenerate the offending code in the same turn.
Workflow automation. Hooks can do useful, non-security work too. Run a formatter on every file edit. Inject git status and the open issue list into the agent’s context on SessionStart. Auto-submit a follow-up prompt on Stop if a test failed. Send a Slack notification when a long-running task completes. The same primitive that enforces policy is also a clean extension point for the small workflow scripts every team eventually wants.
The unifying property is that all four buckets benefit from the same thing: a single, well-defined point in the agent loop where structured event data is available and a return value can change what happens next. Without hooks, each of these problems requires a different integration pattern. With hooks, they are all the same shape.
How to implement a hook
The configuration model is broadly shared across providers. A JSON file lives in a known location (~/.claude/settings.json, ~/.cursor/hooks.json, ~/.codex/hooks.json). The file lists events, optional matchers (so a hook only fires for Bash tool calls, or for npm install commands specifically), and the handler to run.
Below is the same minimal hook (block rm -rf before it executes) implemented for each of the major providers. The shape of the decision response and the field names differ, but the pattern is the same: a config registers a script against a tool event, and the script reads the structured event from stdin and returns a JSON decision on stdout (using jq for parsing).
The config that registers the hook with the agent:
~/.claude/settings.json
The script the agent invokes when the hook fires:
.claude/hooks/block-rm.sh
The shape of the decision differs across providers (Claude Code wraps it in hookSpecificOutput, Cursor uses a top-level permission string, Codex uses a top-level permissionDecision), but the input contract is the same: structured JSON in, JSON decision out, fail open on any error. Swap PreToolUse for UserPromptSubmit and the hook sees prompt text instead of tool input. Swap it for PostToolUse and the hook sees the tool’s output. The matcher narrows execution to the events that matter, so the org is not paying for a hook to run on every event when, for example, it only cares about Bash.
The other three handler types, HTTP, MCP tool, and prompt, follow the same input/output contract. An HTTP hook posts the event JSON to a URL and uses the response. An MCP-tool hook calls a tool on a connected server with the event as the argument. A prompt hook runs a single-turn LLM evaluation with the event as context. The tradeoffs are familiar. Shell scripts are fastest and run offline. HTTP endpoints centralize logic across the fleet. MCP tools reuse existing servers. Prompt hooks are the easiest way to express something fuzzy (“does this command look destructive?”) without writing code, and pair well with careful prompting.
The right pattern for an enterprise is usually a layered one. Put the cheap, deterministic checks (gitleaks-style secret-detection regex, an explicit deny list) in a local script so they run sub-millisecond and work offline. Forward the structured event to a central HTTP endpoint for everything else: telemetry, slower LLM-based classification, cross-session pattern matching. The local script is the policy gate. The central endpoint is the observability and analytics layer.
The most useful hooks
Of the dozens of events the major providers expose, four cover most of what an AI governance program actually needs.
UserPromptSubmit. Fires when a user submits a prompt to the agent. This is the chokepoint for inbound data. A hook here can scan the prompt for secrets pasted out of a .env file, redact PII before it reaches the model, or block prompts that match a deny pattern. It is also the right place to add organizational context. A team can inject their internal style guide, the current sprint’s open tickets, or the user’s role into every prompt without asking employees to remember to do it.
PreToolUse. Fires before any tool the agent calls. This is the chokepoint for outbound actions. A hook here sees the tool name (Bash, Write, Edit, mcp__github__create_pr), the arguments, and the working directory. Almost every real-time policy enforcement use case lives here: deny dangerous shell commands, scope file writes to allowed directories, gate MCP tool calls behind an approval flow, strip credentials out of HTTP requests before the tool issues them.
PostToolUse. Fires after a tool returns. This is where tool outputs get inspected. The use case people miss most often is exfiltration detection. A cat .env runs cleanly through PreToolUse because the command itself is fine. The risk is in the result. A PostToolUse hook on Bash and Read sees the contents and can decide whether the agent should be allowed to use them in the rest of the conversation.
SessionEnd (Claude Code) / stop (Cursor). Fires when the agent finishes a turn or a session. This is the hook that turns hooks into a complete observability story. A handler here captures the full transcript and ships it to a central store. Once the transcript is in a queryable system, the questions shift from “block this in real time” to “show me every session that touched the customer database” or “find the prompts that made this agent fail.” Real-time blocking and retroactive analysis are the same data, used differently.
The other events (SessionStart, Notification, SubagentStart, PreCompact, and the rest) are useful for specific automations. But the four above are what an enterprise governance baseline looks like. If hooks are configured for those events and the data is going to a central store, most of what a security team needs to do is already possible.
Which AI agents support hooks
While hooks have become a standard feature of the major AI agents over the past year, the interfaces unfortunately are not. Some providers expose granular, agent-loop-specific events (Cursor’s afterAgentResponse, Claude Code’s WorktreeCreate). Others fold many concerns under a single generic event (Codex routes all tool calls through PreToolUse, including shell commands and MCP). The table below maps every concept across the four providers, so an organization writing hook logic needs to check what does and does not exist on each.
Native provider event support
This table compares provider hook surfaces directly. Cells show the provider-native hook event that supports the concept, or No when the provider does not currently expose one.
| Hook concept | Claude Code | Cursor | Codex | VS Code |
|---|---|---|---|---|
| Session starts | SessionStart | sessionStart | SessionStart | SessionStart |
| Session ends | SessionEnd | sessionEnd | No | No |
| Session setup | Setup | No | No | No |
| User submits prompt | UserPromptSubmit | beforeSubmitPrompt | UserPromptSubmit | UserPromptSubmit |
| Prompt expansion | UserPromptExpansion | No | No | No |
| Before tool use | PreToolUse | preToolUse | PreToolUse | PreToolUse |
| After tool use | PostToolUse | postToolUse | PostToolUse | PostToolUse |
| Tool failure | PostToolUseFailure | postToolUseFailure | No | No |
| Tool batch completes | PostToolBatch | No | No | No |
| Permission or approval request | PermissionRequest | preToolUse | PermissionRequest | PreToolUse |
| Permission denied | PermissionDenied | No | No | No |
| Agent stops | Stop | stop | Stop | Stop |
| Agent stop failure | StopFailure | No | No | No |
| Agent response emitted | No | afterAgentResponse | No | No |
| Agent thought emitted | No | afterAgentThought | No | No |
| Subagent starts | SubagentStart | subagentStart | No | SubagentStart |
| Subagent stops | SubagentStop | subagentStop | No | SubagentStop |
| Before context compaction | PreCompact | preCompact | No | PreCompact |
| After context compaction | PostCompact | No | No | No |
| Before shell command | PreToolUse | beforeShellExecution | PreToolUse | PreToolUse |
| After shell command | PostToolUse | afterShellExecution | PostToolUse | PostToolUse |
| Before MCP tool call | PreToolUse | beforeMCPExecution | PreToolUse | PreToolUse |
| After MCP tool call | PostToolUse | afterMCPExecution | PostToolUse | PostToolUse |
| Before file read | PreToolUse | beforeReadFile | PreToolUse | PreToolUse |
| After file edit | PostToolUse | afterFileEdit | PostToolUse | PostToolUse |
| Before tab file read | No | beforeTabFileRead | No | No |
| After tab file edit | No | afterTabFileEdit | No | No |
| Notification | Notification | No | No | No |
| Config changes | ConfigChange | No | No | No |
| Working directory changes | CwdChanged | No | No | No |
| Watched file changes | FileChanged | No | No | No |
| Worktree created | WorktreeCreate | No | No | No |
| Worktree removed | WorktreeRemove | No | No | No |
| Task created | TaskCreated | No | No | No |
| Task completed | TaskCompleted | No | No | No |
| Teammate idle | TeammateIdle | No | No | No |
| User elicitation requested | Elicitation | No | No | No |
| User elicitation answered | ElicitationResult | No | No | No |
| Instructions loaded | InstructionsLoaded | No | No | No |
A few patterns are worth pulling out.
Claude Code has the broadest surface. It exposes lifecycle events that no other provider does, including Setup, WorktreeCreate/WorktreeRemove, TaskCreated/TaskCompleted, TeammateIdle, and Elicitation. For organizations standardizing on Claude Code, the policy and observability ceiling is correspondingly higher.
Cursor leans into agent-loop introspection. It is the only provider that exposes afterAgentResponse and afterAgentThought, which surface the model’s intermediate output rather than the tool calls. Cursor also exposes the most granular file-operation hooks (beforeReadFile, afterFileEdit, beforeTabFileRead, afterTabFileEdit), which makes it the easiest place to wire in code-quality tooling like Semgrep or Snyk.
Codex collapses many concerns into PreToolUse/PostToolUse. Shell commands, MCP calls, and file operations all flow through the generic tool events. This is simpler to reason about, but it pushes the disambiguation work into the matcher ("matcher": "Bash", "matcher": "mcp__filesystem__.*").
VS Code’s surface tracks Claude Code’s vocabulary. Most events use the same PascalCase names. A hook script written against Claude Code’s input format will usually run against VS Code Copilot with minimal changes.
The portable subset, the events every provider supports, is SessionStart, the prompt-submit event, PreToolUse, PostToolUse, and Stop. An organization writing hook logic against just those events can ship the same handler across all four agents and pick up provider-specific events later.
Where hook implementations break down at scale
Hooks are a powerful primitive, but the gap between “the providers ship hooks” and “our enterprise has hook-based governance in production” is wider than it looks. Three implementation problems show up in every rollout.
Inconsistency across agents. As demonstrated, every provider ships its own configuration format, event vocabulary, input schema, and decision-response shape. Claude Code uses PascalCase event names and a hookSpecificOutput envelope for permission decisions. Cursor uses camelCase, exits with code 2 to deny, and exposes granular events that have no analogue elsewhere. Codex collapses shell, MCP, and file operations into a single generic PreToolUse and pushes disambiguation into matchers. A secret-detection rule written for Claude Code is not the same script that works on Cursor, and neither is the script that works on Codex. An organization that standardizes on one set of policies ends up maintaining three or four implementations and reconciling them by hand. The portable subset is small enough to be useful but not expressive enough to be a real governance program.
Internal distribution. Configuring hooks on a single developer laptop is a 10-minute exercise. Configuring them on every laptop in a thousand-person engineering org, keeping them updated, rotating the credentials they carry, and making sure no developer ever runs an AI agent without them, is a different problem entirely. Native plugin systems (Claude Code plugins, Cursor plugins) help, but the typical setup flow still requires a team admin to paste API keys, configure a SessionStart hook to inject env vars, and trust that every developer installs the right marketplace. One missed step leaves a silent observability gap, exactly the failure mode behind shadow AI. MDM is the right delivery vehicle, but AI-agent-specific MDM profiles are nascent and most platform teams do not yet have a default playbook for them. The result is a fleet where half the laptops are governed and the other half are not, with no easy way to tell which is which.
Data analysis. Once hook events are flowing, the next problem is what to do with the stream. A SIEM is built for security log records, not for AI session transcripts with embedded prompts and tool outputs. An OpenTelemetry collector can route the events but does not know how to interpret them. The volume at fleet scale is large enough that naive storage is expensive and naive querying is slow. Turning hook events into the things a security or platform team actually wants (“alert me when a customer record leaves the perimeter,” “show me the top ten most-triggered deny rules this week,” “find every session where a developer pasted a credential into a prompt”) requires a data model, a query layer, and a UI. None of that is in the box with the hook primitive. Most teams that ship hooks end up writing the analytics layer themselves, and most of those analytics layers stop at “we have a dashboard with a count.”
The combination is what makes hook-based governance hard to operate even when the primitive itself is well-defined. Inconsistency means policy authoring is N times the work. Distribution means coverage is patchy. Data analysis means the events accumulate without anyone being able to act on them. A real implementation has to solve all three.
How Speakeasy helps
Speakeasy is built to close those three gaps. The product treats hooks as a first-class primitive and supplies the layer above them: portable configurations across agents, fleet-wide distribution, and a data plane built for AI session telemetry. Speakeasy is the platform that turns hooks into governance.
The reference architecture below shows how the pieces fit together. Hooks sit inside each governed AI agent (the orange and red panels on the left), pre-installed via MDM. They emit a structured event feed into the AI control plane, where policy definitions, observability sinks, and the LLM and MCP gateways live. The agent’s traffic never reaches an external LLM provider or an internal system without flowing through that plane.
Provisioning at enrollment
Jamf, Kandji, Intune, Fleet, and JumpCloud push hook configs, the approved MCP registry, identity (SSO), and reviewed skills onto every laptop before the employee opens it.
Governed AI agents
Claude Code, Cursor, and the other agentic clients run with hooks installed. PreToolUse and PostToolUse fire on every prompt and tool call.
Policy definitions
Two policy lanes. Sensitive data covers structured signals like API keys, tokens, PII, credentials, and source code. Compliance policies cover regulatory frames like HIPAA, GDPR, SOC 2, and PCI.
Unified event feed
Every hook firing emits structured JSON to a unified feed. Sinks land in the SIEM, plus full session transcripts that are replayable and queryable.
Egress and tools
The LLM Gateway brokers calls to external providers without leaking keys to the client. The MCP Gateway only routes to approved servers. Identity binds every action to a real human via SSO.
External LLM providers
Anthropic, OpenAI, Google, and self-hosted models. Reached only through the gateway, never directly from the client.
Internal systems
Databases, data warehouses, internal APIs, and SaaS apps. The blast radius of an ungoverned agent, and exactly what the architecture is designed to contain.
Pre-built hook configurations as native plugins. Speakeasy ships hook configurations for Claude Code and Cursor through their native plugin systems rather than as a custom installer. A team admin marks the Speakeasy plugin as required, and every developer’s agent picks up the hooks the next time it starts. No env-var pasting, no per-laptop credential setup, no missed step that leaves a developer outside the observability path. Day-0 secret-detection rules are gitleaks-compatible and cover AWS, GitHub, GitLab, Slack, Stripe, GCP, Heroku, Twilio, SendGrid, npm, PyPI, OpenAI, and Anthropic credentials, plus generic API-key and database-connection-string patterns. Custom rules per organization are configured from the same control plane.
Two-tier evaluation that does not break the agent. The hard part of hook-based governance is staying out of the hot path. Putting every tool call behind a synchronous API to a remote policy service would add 50 to 200ms to every action and break the experience the moment the network blips. Speakeasy’s hooks evaluate cheap, deterministic rules (regex secret detection, deny lists) client-side at sub-millisecond cost, with no network round trip. Anything those local rules do not catch flows on as a structured event to the AI control plane, where slower LLM-based PII detection, cross-session pattern matching, and rules that would have been too expensive to evaluate inline run over the unified feed. The first pass is proactive and blocks. The second pass is reactive and surfaces.
MDM-native provisioning. The same hook configurations can be deployed through Jamf, Kandji, Intune, Fleet, JumpCloud, and the rest, so AI agents are governed from the moment a laptop is enrolled. There is no point at which a developer is using Cursor without observability, because there is no version of Cursor on the fleet that was not provisioned with the hooks attached.
Unified observability across every agent. Hooks send structured events into the Speakeasy control plane, where they become a single feed of every prompt, every MCP tool call, every shell command, and every response, across every employee, every client, and every model. The data that was previously trapped on individual laptops is now a queryable, dashboarded record. Token use by team. Tool use by user. Most-triggered policy rules. Sessions where a customer record left the perimeter. The same data that drives real-time blocking also drives the audit trail.
Discovery alongside enforcement. Hooks tell you about the activity going through the clients you have governed. The control plane also discovers MCP servers and skills installed outside that path, so the long tail of shadow AI surfaces in the same dashboard as the managed traffic. Governed traffic and ungoverned traffic are visible in one place, with one set of policies.
The point of all of this is not to slow employees down. It is the opposite. Hooks are the primitive that lets a security team say yes to the next AI tool instead of saying no out of caution, because the rollout and the governance ship together. Configure the hooks, point them at the AI control plane, and the AI investment the board asked for becomes something the platform team can actually measure, secure, and expand.
If you are deciding where to start with AI governance this quarter, hooks are the right primitive. Speakeasy is the fastest way to get them in place across the fleet.