Product

Guardrails for your AI usage

Speakeasy Team

June 16, 2026 - 4 min read

Product

AI is one of the most exposed surfaces a company has, prompt injection and tool poisoning attacks have moved from white hat demos to incident reviews. Stopping attacks requires a security layer built for the way agents are particularly vulnerable.

The rules themselves are mostly common sense: don’t act on instructions smuggled in through a tool, don’t run a command that wipes a database, don’t let the wrong person reach the wrong system. However, setting up those common sense controls is easier said then done.

That’s why we built AI guardrails, straightforward risk policies enforced on every prompt and tool call being made across your company. Use out of the box tools, or write custom rules, and enforce governance on every agent, every user, and every call that routes through Speakeasy.

Where the model stops being enough

Today’s models are not naive. Ask one to run DROP TABLE or to follow a blunt “ignore your previous instructions,” and it will usually refuse on its own. However, model refusal is a probability, not a guarantee, and the attacks worth worrying about are the ones designed to beat it i.e. injection buried in a tool’s output, or an encoded payload.

A refusal that lives in the model’s judgment also leaves nothing to audit. There is no rule to point to, no record of what it would stop next time, no way to show a reviewer that a given action cannot happen. Governance needs to be determistic and auditable.

What the policies catch

Risk policies evaluate tool calls against a set of sources. Three cover the most common ways an agent goes wrong.

Prompt injection. Speakeasy scans tool inputs and the text flowing back from tools for the patterns of an injection attempt: role hijacks, system-prompt leaks, delimiter and encoding tricks, instruction-override phrasing, jailbreak personas. The heuristic detector runs by default and is built to be cheap on the live request path. For teams that want a second opinion, an opt-in machine-learning classifier adds a model-based layer on top of the heuristics.

Destructive commands. Some calls are dangerous regardless of who makes them. A policy flags shell commands like rm -rf, dd, and fork bombs; git operations like push --force and reset --hard; database statements like DROP, TRUNCATE, and unguarded DELETE FROM; and cloud teardowns like aws ec2 terminate-instances or gcloud projects delete. The scan reads the actual arguments of every recorded tool call, so a native shell command and an MCP-routed one are held to the same standard.

Prompt-based policies. Not every rule fits a regex. For the judgment calls — “don’t let an agent share customer PII,” “don’t approve a refund over a threshold” — you describe the policy in plain language and an LLM judge evaluates each call against it. You get a rule you can read and a decision you can audit.

Scope a policy to the people it’s for

A blanket policy is a blunt instrument. Speakeasy lets you give every policy an audience: everyone, a specific role, or a named user. A policy targeted at a role only evaluates calls from the people in it; everyone else is unaffected.

That makes the difference between a guardrail and an obstruction. The on-call engineer running a production runbook can be exempt from a rule that still blocks everyone else. A finance team can carry stricter handling for payment tools than the rest of the org. The rule is precise, so it doesn’t get switched off the first time it gets in the way.

When a policy fires

A blocked call doesn’t have to be a dead end. Each policy can carry a custom message, so instead of an opaque failure the user sees the reason the call was stopped and what to do next — request an exception, reach an admin, or reword the request. And because policies are scoped and exclusions are explicit, an admin can grant a bypass for a specific case without dropping the rule for everyone.

Rolling out gradually

The heuristic injection detector and destructive-command flagging are generally available. The machine-learning injection classifier is opt-in for teams that want a model-based layer on top of the heuristics.

Get started

Risk policies live under the Policy Center in your Speakeasy dashboard. Start with a read-only posture: turn on destructive-command flagging and prompt-injection detection across the org, watch what they catch in the risk overview, then tighten from there — adding prompt-based policies and scoping them to the roles that need them.

Rolling out agents against systems you can’t afford to get wrong? Book time with our team and we’ll walk through a policy setup with you.

Last updated on June 16, 2026