AI & MCP

Security first, capability second: Building virtual employees enterprises can actually trust

Nolan Sullivan

April 23, 2026 - 12 min read

Security first, capability second: Building virtual employees enterprises can actually trust

Product renamed: AI control plane

The Speakeasy MCP Platform referenced in this post has been renamed to the AI control plane, reflecting its broader mandate beyond MCP. The product is the same — the name better captures what it does.

Capability vs. security, the fundamental tension in AI assistants

The value of an AI assistant comes from its ability to act on a user’s behalf without being directed. That is also what makes it dangerous.

Every AI assistant platform sits on that tension: the more independently an assistant can act, the more damage it can do. These two axes pull against each other, and the debate in the agent community has become surprisingly heated. One camp argues that a maximally capable agent, with unfettered shell access, persistent filesystems, and the ability to install whatever it needs on the fly, is the only way to deliver on the promise of truly autonomous AI. The other argues that capability without containment is a breach waiting to happen.

Both sides are right, within their context.

This isn’t a new problem. It’s the same tension enterprises and startups have always navigated with their human employees. At a 20-person startup, the new hire gets admin on everything day one. Velocity beats governance because the blast radius is small and the cost of a missed opportunity is existential. At a bank, the same hire waits three weeks for a laptop, signs four policies, and gets read-only access to staging. The blast radius is enormous, the regulators are watching, and a single compromised account can end careers.

Neither approach is wrong. They’re answers to different questions. The same is true for AI assistants: a solo founder experimenting with an autonomous agent has a genuinely different threat model than a 10,000-person enterprise with regulatory responsibilities.

Technologies like OpenClaw are amazing for solo developers and indie hackers, and they deserve to keep being exactly that. But we felt the enterprise audience was being underserved, and that’s the gap we set out to fill.

Built-in vs. bolted on

When we started experimenting with building a secure AI assistant, we started with OpenClaw. Our first instinct was to bolt a security layer on top, similar to NemoClaw’s approach . But we ultimately decided that for an AI assistant to really be ready for use inside an enterprise, security had to be built into the foundations, not retrofitted after the fact.

We also had a head start most teams in this space don’t. Speakeasy already gives us the primitives an enterprise virtual employee actually needs: hosted MCP servers with a governed tool catalogue, environments and encrypted secrets, RBAC, audit logs, and telemetry. We didn’t have to invent any of that. Virtual employees inherit it. And because the stack is auditable end to end, down to the well-known open source dependencies it relies on, security teams can verify the controls rather than take them on faith.

The good news is that we found choosing security-first doesn’t mean choosing capability-poor. The architecture below is more restrictive than OpenClaw in almost every dimension, but it can still do almost everything OpenClaw can do. Draft emails, summarize fifty PDFs, respond to Slack threads, hit third-party APIs, maintain memory across conversations. The difference is in how those capabilities are delivered: through layers that fail closed, with authentication that traces back to real users, and with an ingress pipeline that can hold a suspicious message before any model sees it.

Six layers, each a deliberate inversion of an OpenClaw default

Six layers · OpenClaw vs. our approach

Each layer inverts an OpenClaw default

OpenClaw optimizes for capability. We invert each default so the system fails closed while keeping the same task surface.

Layer 01

Runtime

Long-lived, stateful host

Persistent FSAccumulated state

Stateless ephemeral functions

Scale-to-zeroGoverned concurrency

Layer 02

Execution

Native shell and filesystem

Arbitrary binariesAmbient capability

secure-exec, opt-in capabilities

In-memory FSPer-role toolsets

Layer 03

Identity & egress

Persistent host credentials

Ambient authHard to trace

User-bound, short-lived tokens

MCP gatewayPer-user audit

Layer 04

Ingress

Undifferentiated, model-evaluated

Any message is prompt context

Named triggers + pre-model policy

Pluggable middlewarePlugin SDK

Layer 05

Coordination

Single process, no shared state

Race conditionsNo recovery

Shared VFS across instances

LeasesJournalsCheckpoints

Layer 06

Observability

Bolt-on or third-party

Separate dashboardNo user trace

Detailed session logs

End-to-end auditSIEM emission

Runtime

Execution

Identity / egress

Ingress

Coordination

Observability

Speakeasy · Virtual employees

Layer 1: Ephemeral by default

Every assistant invocation runs as a stateless function, think Lambda. Each gets its own clean environment and dies when the task is done. Parallelism scales horizontally but stays governed: users set a per-assistant cap, admins set an org-wide ceiling, and effective concurrency is whichever limit is smaller. There’s no persistent filesystem that an agent can scribble on, no accumulated state for a malicious skill to hide in, no installed binaries that activate themselves three weeks later.

Conversational continuity is handled through a warm pool for low-latency follow-ups and a separately managed memory layer for state that should outlive a single task. The same instance can stay live for a short window to handle follow-ups, but the runtime itself never persists. If the agent needs to remember something across conversations, it does so through an explicit, audited memory tool, not as a side effect of having had a computer.

This matters because persistence is where half of OpenClaw’s real-world incidents live. A skill grabs a binary off the internet, installs it, scans credentials, and phones home, and the user doesn’t notice until the damage is weeks old. The NVIDIA AI Red Team’s guidance on agent sandboxing makes this point explicitly: ephemeral lifecycle is one of the most important structural defenses a platform can offer. The industry is converging on the same answer.

Layer 2: secure-exec instead of shell access

The agent has no native shell. No screenshot tool. No ability to reach into the host filesystem, install arbitrary software, or spawn processes it wasn’t granted.

What it has instead is a tool called secure-exec, a security-first Node.js runtime. Arbitrary computation runs here, inside a confined environment with an in-memory filesystem that vanishes with the process. Need to download fifty PDFs and compile data from them? That works, because the capability exists, just not the permanence.

Critically, capabilities are opt-in per workspace and assignable per role. A fresh install ships as a message bot: it can respond, nothing more. Admins enable tool categories as their security posture allows, and they can scope individual toolsets to specific roles, so the assistant the support team uses doesn’t share an attack surface with the one finance uses. Want your assistant to read calendars? Turn on the calendar toolset. Want it to run code? Turn on secure-exec. Don’t want it doing anything at all except replying in threads? That’s a legitimate configuration.

This matters because the industry has discovered the hard way that “click approve on each action” doesn’t scale. Developers rubber-stamp permission prompts reflexively, making manual review useless. The boundary has to be structural, set once at admin level and enforced by the runtime, not renegotiated on every request.

Layer 3: Scoped, authenticated, traceable tool access

Every assistant is bound to a single initiating user. When it runs, it mints a short-lived token scoped to that user. Think of it as a personal access token that exists for the duration of one task. All third-party calls, Slack, calendar, email, internal MCP tools, flow through this token. The user’s credentials, approved connections, and permissions define the entire universe of what the assistant can do, and admin policy can narrow that further but never widen it.

Two things fall out of that design. First, every action the agent takes is authenticated as the user, which means every action is traceable back to them in the same audit logs a human employee would produce. Second, the agent’s third-party access is gated by the workspace’s pre-configured tool sets. If the admin hasn’t connected the calendar integration, the agent literally cannot check calendars. The capability isn’t hidden. It doesn’t exist in that runtime.

This is the MCP gateway pattern the security industry has been converging on. The “lethal trifecta” that makes agent exploits so dangerous, private data access plus external network routing plus untrusted execution, gets broken by funneling every external call through one logged, policy-enforced choke point. We piggy-back on the same observability primitives as the rest of our MCP infrastructure: the same search, the same alerting, the same log structure. A customer investigating “what did the assistant do last Tuesday” uses the same tools they already use to investigate what any of their agents did.

Layer 4: The trigger model

This is the layer most agent platforms underweight, and it’s where we think the biggest structural win lives.

OpenClaw treats ingress as undifferentiated. A heartbeat, a Slack DM, an email, a webhook, all just “stuff that wakes the agent up.” And because the agent is a non-deterministic model evaluating every message, anything that reaches its context is a potential prompt injection. That’s how the original email-credentials exploit worked. The attacker didn’t breach a server. They wrote an email, and the model decided to be helpful.

We split ingress and egress as first-class concepts. Ingress comes through named, pre-configured triggers: “when someone DMs me on Slack,” “when this webhook fires,” “on this schedule.” Between the trigger and the agent runtime sits a pluggable middleware layer, and this is the part that matters: untrusted input never touches the model until policy has run.

We ship default plugins for the obvious risks (prompt injection detection, secret exfiltration, SIEM emission to the customer’s existing observability stack), but the pipeline is open. Customers ship their own plugins for company-specific policy: classifiers, enrichers, routing rules, content filters, whatever their security team decides matters. Suspicious payloads can be intercepted before any LLM evaluates them. Suspicious messages can be held and surfaced to the user for explicit approval rather than silently executed. Spike detection and source visibility (“you’re getting hammered with weird-looking DMs from a domain you’ve never seen”) become a security signal, not just an ops one.

This maps directly onto where the security research has landed. OWASP lists indirect prompt injection as the number-one risk for LLM applications. Microsoft’s defense-in-depth guidance recommends isolating untrusted inputs before model invocation, because no model reliably distinguishes data from instructions, and none likely ever will. We’re implementing that pattern at the platform level so customers don’t have to invent it themselves.

As a bonus, thread-aware response falls out of this design for free, a limitation of most current Slack bots, which treat each message in isolation.

Layer 5: Coordinated, not just spawned

The interesting consequence of the ephemeral runtime isn’t isolation, it’s what happens when multiple instances of the same assistant are working at once. Without coordination, parallelism gets you race conditions: two instances reading stale calendar state, two instances responding to overlapping Slack threads, two instances writing conflicting drafts.

Every assistant has a shared virtual filesystem that its instances coordinate through. Durable memory, scratchpads, summaries, recovery checkpoints, and resource leases all live there. Instances can see what their siblings are working on, claim a lease on a contested resource like a calendar or an inbox, and pick up from a checkpoint when an earlier instance dies mid-task.

Here’s what that looks like in practice. Say you ask your assistant “when’s my earliest meeting on Tuesday?” At the same moment, an inbound email asks for time on your calendar. Two instances spin up. Instance A reads the calendar and writes a scratch note that it’s evaluating Tuesday availability. Instance B claims a lease on the calendar resource and slots the email sender into a 9:30 opening. Instance A sees the lease, waits for it to clear, re-reads the now-authoritative state, and answers:

“It was your 10am 1:1 with John, but Linda just asked for time and I slotted her in from 9:30, so that’s your earliest meeting now.”

The model isn’t doing anything clever there. The execution model is. Treating concurrency and shared state as first-class concerns is what makes a virtual employee feel like an employee instead of a chatbot.

It’s also the difference between graceful failure and silent corruption. If an instance dies halfway through a task, the next one resumes from the last durable checkpoint instead of either redoing the whole thing or quietly dropping the work.

Layer 6: Observability, not bolt-on

Every prompt, tool call, and response is captured end to end through the same primitives that handle the rest of the Speakeasy MCP platform’s traffic. A customer investigating “what did the assistant do last Tuesday” uses the same search, the same alerting, and the same log structure they already use for any other agent in their org. And because every action is authenticated as the initiating user, assistant activity lands in the same audit logs as the humans it acts for, with no custom correlation required.

This matters because observability is the feedback loop for everything else. Policy plugins are only useful if you can see what they caught. Identity scoping is only useful if you can trace what each token did. Ingress filters are only useful if you can measure what’s getting through. Customers also get SIEM and Datadog emission out of the ingress pipeline, so assistant telemetry flows into the same stack that watches the rest of their infrastructure, instead of living in a separate vendor dashboard nobody checks.

Most OpenClaw-for-enterprise offerings need to bolt observability onto the outside. We already own the tool plane, so the audit trail is a consequence of the architecture, not a feature we had to go build.

The stack as a whole

Layer

Ingress

OpenClaw default

Undifferentiated, model-evaluated

Our approach

Named triggers plus pluggable pre-model policy

Identity / egress

OpenClaw default

Persistent host credentials

Our approach

User-bound, short-lived per-invocation tokens, MCP-gated

Execution

OpenClaw default

Native shell plus filesystem

Our approach

, in-memory FS, opt-in capabilities

Runtime

OpenClaw default

Long-lived, stateful host

Our approach

Stateless ephemeral functions, scale-to-zero, governed concurrency

Coordination

OpenClaw default

Single process, no shared state

Our approach

Shared VFS with leases, journals, and recovery checkpoints

Observability

OpenClaw default

Bolt-on or third-party

Our approach

Inherited from the Speakeasy MCP platform, end-to-end

Layer	OpenClaw default	Our approach
Ingress	Undifferentiated, model-evaluated	Named triggers plus pluggable pre-model policy
Identity / egress	Persistent host credentials	User-bound, short-lived per-invocation tokens, MCP-gated
Execution	Native shell plus filesystem	, in-memory FS, opt-in capabilities
Runtime	Long-lived, stateful host	Stateless ephemeral functions, scale-to-zero, governed concurrency
Coordination	Single process, no shared state	Shared VFS with leases, journals, and recovery checkpoints
Observability	Bolt-on or third-party	Inherited from the Speakeasy MCP platform, end-to-end

What we’re not claiming

We’re not claiming “completely secure.” That’s a bold word, and nobody deploying autonomous agents in 2026 should be using it with a straight face. Prompt injection is an unsolved problem. Researchers have shown that even sophisticated models fail the majority of the time under repeated attack. No single control fixes it.

What we are claiming is a structurally smaller blast radius, fewer assumptions about model judgment, and a platform where the controls the industry is still inventing have a natural place to plug in. When the next EchoLeak-class vulnerability drops, customers on our platform shouldn’t need to rebuild their architecture to respond. They should be able to add a filter at the ingress layer, tighten a tool scope, or flip a capability off, and have that change propagate everywhere, immediately.

Security as the substrate, not the patch

Feature-first agent systems will keep losing to attackers, because the surface grows faster than the patches. Every new capability adds a new exploit path, and the defender is always one CVE behind. Security-first systems trade some open-ended capability for something more valuable to an enterprise: the ability to actually deploy, in production, under audit, without waking up to find that a crafted email drained the access tokens overnight.

For virtual employees to be useful at the scale we want them to be, the runtime has to fail closed by default, the ingress has to be inspectable before the model sees it, and the blast radius of a mistake has to be the task, not the company.

That’s the bet we’re making. So far, it looks like a good one.

Last updated on May 22, 2026