Skip to Content

Engineering

Reducing false positives in AI workflows

Vishal Gowda

Vishal Gowda

June 17, 2026 - 3 min read

Engineering

This is a companion to how we approach AI security, where we walked through the four points on an agent’s path where policy applies and the detection techniques behind each one. That piece covered one half of keeping a control switched on: speed. This one is about the other half: noise.

The second dimension we keep coming back to is noise. A policy that fires constantly on safe interactions trains its owners to ignore it, which is worse than not having the policy at all. We’re careful about calling these “false positives,” because the detector is usually doing its job correctly. The policy is just scoped more broadly than the situation calls for, so the fix is better scoping, not a blunter detector.

Fine-grained scopes

Beyond the four coarse points on the path, a policy can be narrowed with specific conditions: an individual MCP server, a specific tool function, or a pattern inside tool arguments, combined with operators and JSON path queries against the payload. A PII policy scoped only to user prompts won’t fire on an internal tool that legitimately returns customer records.

Role-scoped policies

Scope isn’t only about what an interaction touches. It’s also about who is behind it. Role-based access control (RBAC) lets a policy apply to specific users or groups instead of everyone at once. A strict rule that would be noise across the whole org can be scoped to the people it’s meant for: tighter controls for contractors, looser ones for an internal team that works with sensitive data by design. The same detector that fires constantly when it’s pointed at everyone gets quiet and useful when it only runs for the roles where the behavior is actually a concern.

Exemptions

Exemptions are the converse primitive. Where a scope says “apply the policy here,” an exemption says “when these conditions hold, don’t let this count as a violation.” That can happen before evaluation, where the interaction is short-circuited and the policy never runs, or after it, where a finding is suppressed once the conditions are known. Either way, the effect is the same: someone defines a deliberately loose policy covering everything, then carves out the interactions already known to be safe. Calls to MCP servers hosted on the platform are a natural candidate, and we can prepopulate exemptions for them so a policy author never has to think about it.

Built-in detection rules you can test

We ship built-in detection rules so teams aren’t starting from a blank page, and each one can be tried before it goes live. Feed it a sample of text, or run it against a set of existing user sessions, and see what it would have flagged. Tuning scope against real traffic, rather than guessing, is how a policy gets quiet without getting weak.

Where this fits

Scoping, role-based targeting, exemptions, and testing are the noise side of the same enforcement model we use across the AI control plane, where synchronous hooks inspect prompts, model responses, tool calls, and tool responses on the path between an agent and the systems it reaches. For the other half of the picture, where policy applies and how detection actually runs, see how we approach AI security.

Frequently asked questions

How do you reduce false positives in AI security policies?

Most false positives come from a policy scoped too broadly, not from a faulty detector. Narrowing the scope with specific conditions, scoping policies to the users or groups they apply to with RBAC, adding exemptions for known-safe interactions, and testing rules against real session history before enabling them keeps policies quiet without weakening them.

Last updated on

AI everywhere.