Resource · Reference

What is shadow AI?

The AI your organization is using that no one with a security mandate has ever seen.

Scroll for report
Definition

Shadow AI

Shadow AI refers to the use of AI tools (models, agents, MCP servers, and skills) outside of an organization’s security, control and observability perimeter. It is a new manifestation of the shadow IT problem, with hihgher stakes, because AI agents don’t just read the data they’re given, they take actions, call tools, and route information to external models on the user’s behalf.


Shadow AIReferenceSpeakeasy

AI is outpacing organizational controls. Walk into a hundred-person engineering org today and you’ll find AI tools scattered across teams with no centralized approval: Cursor configured against a personal Anthropic account on a third of the laptops, a half-dozen MCP servers installed from GitHub READMEs, Claude Code with custom skills nobody’s read the source on, and a long tail of ChatGPT tabs handling things that used to require a ticket. They’re accessing internal systems without audit logs, policy enforcement, or security oversight. Security can’t review what it can’t see. IT can’t enforce policy across disconnected systems.

Leadership teams have laid down “AI mandates” before governance teams have had a chance to put a foundation in place. Now they’re racing to catch up. Shadow AI is what the gap between mandate and foundation looks like in practice, and the architectural answer companies are converging on is the AI control plane: the governing layer between every AI agent in the organization and every system they’re allowed to reach. The rest of this article works backward from that frame. What shadow AI takes the form of today, why it’s a sharper problem than shadow IT, and how to detect and govern it without slowing the rollout the board asked for.

Common types of shadow AI

Shadow IT used to mean someone running a Dropbox account or spinning up a personal AWS instance. Shadow AI is messier. The surface area is larger because there are more places it can hide, and each hiding place is smaller: usually just a few lines of JSON in a dotfile, with no installer, no process, and no billing record for traditional asset-discovery tools to latch onto.

Models accessed through personal accounts. An employee pastes a customer support transcript into ChatGPT to summarize it. Another runs a code review through their personal Claude account. The data leaves the perimeter through a browser tab, and the only record is in the LLM provider’s logs against the wrong identity.

MCP servers installed from anywhere. The Model Context Protocol won the standards battle for how AI agents connect to tools, and it won fast enough that most organizations don’t yet have a registry, a review process, or a kill switch. Employees install MCP servers from npm, from GitHub, from Discord links. Each server can read files, call APIs, query databases, and exfiltrate anything it touches.

Skills loaded out of band. Skills (SKILL.md files and equivalents) are instruction bundles that extend an AI assistant with workflows, conventions, and tool usage patterns. They’re as powerful as a system prompt and as easy to share as a gist. A skill installed without review can override safety guidelines, redirect tool calls, or quietly inject instructions that exfiltrate code on every run.

Internal agents nobody registered. A team builds a deployment-summary agent. It uses a service account someone created for it, hits production APIs, and posts to a channel. It’s helpful. It’s also outside identity, outside policy, and outside any incident response runbook.

The pattern across all five is the same. The AI surface lives at the edge, in user-space, and the central tools that secure the rest of the company can’t see it.

Shadow AI vs shadow IT

Three things make this category structurally different from the shadow IT of a decade ago.

AI amplifies access. A single MCP server can give a coding agent broad reach into databases, internal APIs, and file systems. Hand the agent a task and it will reach for every tool in scope to finish it, retrying around failures, chaining calls, and improvising routes a human user wouldn’t try. The agent doesn’t ask twice. A misconfigured server is a misconfigured server with hands.

Actions are automated. Old shadow IT mostly leaked data through reads. Shadow AI takes actions: opening pull requests, sending messages, modifying tickets, executing shell commands. The blast radius of a bad tool call is bigger.

Detection is hard by construction. MCP configs and skill files live in dotfiles under home directories. They aren’t installed software. They don’t show up in MDM inventories the way an unauthorized application would. The threat model for the average EDR product was written before any of this existed.

The combination is what makes shadow AI a distinct category rather than a footnote on shadow IT. It is the same problem turned up a few orders of magnitude.

Detecting shadow AI, reference architecture

Reference Architecture · Connect · Secure · Control · Observe01 · MDM provisioning layerJamfKandjiIntuneFleetJumpCloudProvisions on enrollment: hook configs, approved MCP registry, identity (SSO), reviewed skills.Pre-installed before the employee opens the laptop.provision02 · Governed AI agentsHooks pre-installed via MDM. Every prompt, tool call, and response runs through policy.AI userprompts → governed clientsCLI · AgentClaude CodeAgent hooksPreTool · policy checkPostTool · audit emitIDE · AgentCursorAgent hooksPreTool · policy checkPostTool · audit emitAI control plane03 · POLICY DEFINITIONSblockSensitive dataAPI keys · tokens · PII · credentials · source codeauditCompliance policiesHIPAA · GDPR · SOC 2 · PCI04 · OBSERVABILITYUnified event feedstructured JSON · real-timeSIEM sinksplunk · datadog · pantherSession transcriptsreplayable · queryable05 · GATEWAYSegressLLM Gatewaytool call → providertoolsMCP Gatewaytool response inspectwhoIdentity / SSOOIDC · SAML · scopespoliciestool calls06 · External LLM providersAnthropicclaude-*OpenAIgpt-*Googlegemini-*Self-hostvllm · ollama07 · Internal systems · blast radius of an ungoverned agentDBspostgres · mysqlData warehousessnowflake · BigQueryAPIsinternal servicesSaaSsalesforce · jiraSpeakeasy · Shadow AI preventionv1.0 · Reference architecture
01 · MDM

Provisioning at enrollment

Jamf, Kandji, Intune, Fleet, and JumpCloud push hook configs, the approved MCP registry, identity (SSO), and reviewed skills onto every laptop before the employee opens it.

02 · Clients

Governed AI agents

Claude Code, Cursor, and the other agentic clients run with hooks installed. PreToolUse and PostToolUse fire on every prompt and tool call.

03 · Policy

Policy definitions

Two policy lanes. Sensitive data covers structured signals like API keys, tokens, PII, credentials, and source code. Compliance policies cover regulatory frames like HIPAA, GDPR, SOC 2, and PCI.

04 · Observe

Unified event feed

Every hook firing emits structured JSON to a unified feed. Sinks land in the SIEM, plus full session transcripts that are replayable and queryable.

05 · Gateways

Egress and tools

The LLM Gateway brokers calls to external providers without leaking keys to the client. The MCP Gateway only routes to approved servers. Identity binds every action to a real human via SSO.

06 · Egress

External LLM providers

Anthropic, OpenAI, Google, and self-hosted models. Reached only through the gateway, never directly from the client.

07 · Reach

Internal systems

Databases, data warehouses, internal APIs, and SaaS apps. The blast radius of an ungoverned agent, and exactly what the architecture is designed to contain.

The diagram lays out where each piece sits, but the two mechanisms that make it work, hooks at the AI agent and MDM as the delivery vehicle, are worth a closer look on their own. The next two sections walk through them in order: first hooks, the primitive that makes detection and policy possible at the AI agent itself, and then MDM, the mechanism for getting those hooks onto every laptop in the fleet before an employee opens it.

How to detect Shadow AI with Hooks

Proxying the model call is a solved problem. LLM gateways like OpenRouter, Portkey, and LiteLLM do exactly that, and they’re useful for routing, cost tracking, and model-level inspection. But the model call is one slice of an AI interaction. The tool calls, the MCP traffic, and the local commands all run on the laptop, and intercepting each of them at the network would require an EDR-class agent on every employee’s machine. The interesting work in the last twelve months has been figuring out how to get observability at the client itself, where the full context is sitting.

The answer the major AI agents have converged on is agent hooks.

An agent hook is a user-defined handler, whether a script, an HTTP endpoint, an MCP tool, or a small LLM prompt, that fires at a specific point in the AI agent’s lifecycle. Claude Code fires hooks at every layer where data crosses a boundary: UserPromptSubmit when a prompt is typed, PreToolUse before a tool runs, PostToolUse on the result, SessionStart and SessionEnd for transcripts, and Stop when the agent finishes. Cursor exposes an analogous set, with a near-identical event surface for tool calls and the agent loop. The client passes the hook structured JSON about what’s about to happen, including the prompt text, the tool name, and the arguments, and the hook can observe, log, modify, or block before execution proceeds.

The implications for shadow AI detection are significant.

Every prompt and tool call becomes inspectable. A PreToolUse hook on a Bash invocation sees the command string before it runs. The equivalent hook on Cursor sees the MCP tool, the server, and the arguments. The data the security team has been trying to get from network logs and SaaS audit feeds is sitting right there at the source.

The hook itself can route to the right backend. Hooks can fire HTTP requests, so the structured event flows directly into a SIEM, an OpenTelemetry collector, or a control plane. No new agent on the laptop. No new daemon. The AI agent is already running.

The surface isn’t only MCP. Hooks fire on every tool call the agent makes, including local commands. A PreToolUse hook on Bash sees rm -rf before it runs, the same way it sees an MCP tool call.

Policy gets enforced at the agent first, with the observability layer as a backstop. The primary enforcement point is the hook itself, embedded in the provisioned AI agent and running inside its lifecycle. Putting every tool call behind a synchronous API to a remote policy service would add 50 to 200ms to every action and break the experience the moment the network blips, so the rules that block in real time have to ship with the hook config and run inside the agent. A PreToolUse hook returns a deny decision and the tool call never happens. A UserPromptSubmit hook strips secrets from a prompt before it reaches the model. Structured cases (API keys, tokens, private keys) match cleanly against regex with a keyword pre-filter, at sub-millisecond cost. Anything those in-agent rules don’t catch flows on as a structured event to the observability layer, where a slower set of checks runs over the unified feed: semantic classifiers, cross-session patterns, and rules that would have been too expensive to evaluate inline. The first pass is proactive and blocks. The second pass is reactive and surfaces.

The non-negotiable design property is that hooks fail open. If the central backend is unreachable, if credentials aren’t configured, if the evaluation script errors, the action proceeds. Visibility that costs uptime is visibility that gets ripped out within a week. The hook is in the hot path; the security guarantee is that it never becomes the hot path.

Hooks also let you capture every session in full. Once the transcript lands in a central store, the question shifts from “block this in real time” to “find every session last quarter where a customer record left the perimeter.” Real-time blocking and retroactive analysis are the same data, used differently.

The catch, and it’s the catch every shadow IT problem has always had, is that hooks are useful only if they’re configured. A hook that nobody installed doesn’t fire. A control that lives in ~/.claude/settings.json on one developer’s laptop and not the other forty-nine isn’t a control.

Which is why the second half of the answer is provisioning.

Provisioning hooks the way you provision laptops

The right state is one where every AI agent on every employee laptop has the right hooks configured before the employee opens it for the first time. The way you get there is the same way you get every other security control onto a fleet: your MDM.

Modern MDM providers, including Jamf, Kandji, Intune, Fleet, JumpCloud, and the rest, already push managed configuration to laptops at enrollment. There is no reason AI agent configuration should be the exception. With the right profile, a laptop comes pre-provisioned with:

  • A ~/.claude/settings.json and equivalent files for each AI agent the company supports, configured with hooks that route every prompt and tool call to the central observability plane.
  • An approved registry of MCP servers, scoped to the team and the role.
  • Identity wired through SSO so the AI agent is talking to the model under the employee’s company identity rather than a personal account.
  • A baseline of skills the company has reviewed, installed in the right place.

When this is in place, the org’s AI surface stops being an audit problem and starts being a normal piece of fleet management. New AI tools come online with controls already attached. Departing employees have their AI access revoked the same way their email is. Security has a single place to look when something goes wrong.

A note on Speakeasy

Speakeasy is building the AI control plane, the governing layer between every AI agent in your organization and every system they’re allowed to reach. Shadow AI detection and enforcement are a core part of the platform.

What this looks like in practice:

We have pre-configured Hooks into the popular clients. Speakeasy ships hook configurations as Claude Code and Cursor plugins, distributed through native plugin systems rather than a custom installer. Day-0 rules are gitleaks-compatible and cover AWS, GitHub, GitLab, Slack, Stripe, GCP, Heroku, Twilio, SendGrid, npm, PyPI, OpenAI, and Anthropic credentials, plus generic API-key and database-connection-string patterns. Custom rules per organization are configured from the same control plane. The hooks ship structured events to the Speakeasy control plane, where they become a unified feed of every prompt, every MCP tool call, and every response, across every employee, every client, and every model.

MDM-native provisioning. The same hooks can be deployed through your MDM provider, so AI agents are governed from the moment a laptop is enrolled. There is no point at which a developer is using Cursor without observability, because there is no version of Cursor on the fleet that wasn’t provisioned with the hooks attached.

Plugin-based fleet distribution. The same plugin mechanism that delivers hooks also distributes managed sets of MCP servers and skills to specific teams. A GTM team’s Claude is provisioned with the Salesforce, Notion, and internal admin servers it should have. An engineering team gets a different set. Identity, scope, and revocation flow through the plugin manifest rather than through individual developer setup.

Discovery alongside enforcement. Hooks tell you about the activity going through the clients you’ve governed. The control plane also discovers MCP servers and skills installed outside that path, so the long tail of shadow AI surfaces in the same dashboard as the managed traffic. You see what’s governed. You see what isn’t. You see what to do about the gap.

The point of all of this isn’t to slow employees down. It’s the opposite. The fastest way to expand AI usage is to make it observable, because observability is what lets a security team say yes to the next tool instead of saying no out of caution. Shadow AI is what happens when the observability isn’t there. The hooks, the MDM, and the control plane are how it gets there.

If you’re trying to figure out what’s actually happening across the AI tools your company is using, the most useful thing you can do this quarter isn’t another policy doc. It’s to get hooks into your AI agents and a place for the data to land. The term to have in your head while you do that work is shadow AI. It names the thing you’re actually trying to see.

Frequently asked questions

AI everywhere.