Skip to Content

AI & MCP

Security first, capability second: Building virtual employees enterprises can actually trust

Nolan Sullivan

Nolan Sullivan

April 23, 2026 - 12 min read

AI & MCP

Capability vs. security, the fundamental tension in AI assistants

The value of an AI assistant comes from its ability to act on a userโ€™s behalf without being directed. That is also what makes it dangerous.

Every AI assistant platform sits on that tension: the more independently an assistant can act, the more damage it can do. These two axes pull against each other, and the debate in the agent community has become surprisingly heated. One camp argues that a maximally capable agent, with unfettered shell access, persistent filesystems, and the ability to install whatever it needs on the fly, is the only way to deliver on the promise of truly autonomous AI. The other argues that capability without containment is a breach waiting to happen.

Both sides are right, within their context.

This isnโ€™t a new problem. Itโ€™s the same tension enterprises and startups have always navigated with their human employees. At a 20-person startup, the new hire gets admin on everything day one. Velocity beats governance because the blast radius is small and the cost of a missed opportunity is existential. At a bank, the same hire waits three weeks for a laptop, signs four policies, and gets read-only access to staging. The blast radius is enormous, the regulators are watching, and a single compromised account can end careers.

Neither approach is wrong. Theyโ€™re answers to different questions. The same is true for AI assistants: a solo founder experimenting with an autonomous agent has a genuinely different threat model than a 10,000-person enterprise with regulatory responsibilities.

Technologies like OpenClaw are amazing for solo developers and indie hackers, and they deserve to keep being exactly that. But we felt the enterprise audience was being underserved, and thatโ€™s the gap we set out to fill.

Built-in vs. bolted on

When we started experimenting with building a secure AI assistant, we started with OpenClaw. Our first instinct was to bolt a security layer on top, similar to NemoClawโ€™s approachย . But we ultimately decided that for an AI assistant to really be ready for use inside an enterprise, security had to be built into the foundations, not retrofitted after the fact.

We also had a head start most teams in this space donโ€™t. Speakeasy already gives us the primitives an enterprise virtual employee actually needs: hosted MCP servers with a governed tool catalogue, environments and encrypted secrets, RBAC, audit logs, and telemetry. We didnโ€™t have to invent any of that. Virtual employees inherit it. And because the stack is auditable end to end, down to the well-known open source dependencies it relies on, security teams can verify the controls rather than take them on faith.

The good news is that we found choosing security-first doesnโ€™t mean choosing capability-poor. The architecture below is more restrictive than OpenClaw in almost every dimension, but it can still do almost everything OpenClaw can do. Draft emails, summarize fifty PDFs, respond to Slack threads, hit third-party APIs, maintain memory across conversations. The difference is in how those capabilities are delivered: through layers that fail closed, with authentication that traces back to real users, and with an ingress pipeline that can hold a suspicious message before any model sees it.

Six layers, each a deliberate inversion of an OpenClaw default

Six layers ยท OpenClaw vs. our approach

Each layer inverts an OpenClaw default

OpenClaw optimizes for capability. We invert each default so the system fails closed while keeping the same task surface.
Layer 01
Runtime
Long-lived, stateful host
Persistent FSAccumulated state
Stateless ephemeral functions
Scale-to-zeroGoverned concurrency
Layer 02
Execution
Native shell and filesystem
Arbitrary binariesAmbient capability
secure-exec, opt-in capabilities
In-memory FSPer-role toolsets
Layer 03
Identity & egress
Persistent host credentials
Ambient authHard to trace
User-bound, short-lived tokens
MCP gatewayPer-user audit
Layer 04
Ingress
Undifferentiated, model-evaluated
Any message is prompt context
Named triggers + pre-model policy
Pluggable middlewarePlugin SDK
Layer 05
Coordination
Single process, no shared state
Race conditionsNo recovery
Shared VFS across instances
LeasesJournalsCheckpoints
Layer 06
Observability
Bolt-on or third-party
Separate dashboardNo user trace
Detailed session logs
End-to-end auditSIEM emission
Runtime
Execution
Identity / egress
Ingress
Coordination
Observability
Speakeasy ยท Virtual employees

Layer 1: Ephemeral by default

Every assistant invocation runs as a stateless function, think Lambda. Each gets its own clean environment and dies when the task is done. Parallelism scales horizontally but stays governed: users set a per-assistant cap, admins set an org-wide ceiling, and effective concurrency is whichever limit is smaller. Thereโ€™s no persistent filesystem that an agent can scribble on, no accumulated state for a malicious skill to hide in, no installed binaries that activate themselves three weeks later.

Conversational continuity is handled through a warm pool for low-latency follow-ups and a separately managed memory layer for state that should outlive a single task. The same instance can stay live for a short window to handle follow-ups, but the runtime itself never persists. If the agent needs to remember something across conversations, it does so through an explicit, audited memory tool, not as a side effect of having had a computer.

This matters because persistence is where half of OpenClawโ€™s real-world incidents live. A skill grabs a binary off the internet, installs it, scans credentials, and phones home, and the user doesnโ€™t notice until the damage is weeks old. The NVIDIA AI Red Teamโ€™s guidance on agent sandboxing makes this point explicitly: ephemeral lifecycle is one of the most important structural defenses a platform can offer. The industry is converging on the same answer.

Layer 2: secure-exec instead of shell access

The agent has no native shell. No screenshot tool. No ability to reach into the host filesystem, install arbitrary software, or spawn processes it wasnโ€™t granted.

What it has instead is a tool called secure-exec, a security-first Node.js runtime. Arbitrary computation runs here, inside a confined environment with an in-memory filesystem that vanishes with the process. Need to download fifty PDFs and compile data from them? That works, because the capability exists, just not the permanence.

Critically, capabilities are opt-in per workspace and assignable per role. A fresh install ships as a message bot: it can respond, nothing more. Admins enable tool categories as their security posture allows, and they can scope individual toolsets to specific roles, so the assistant the support team uses doesnโ€™t share an attack surface with the one finance uses. Want your assistant to read calendars? Turn on the calendar toolset. Want it to run code? Turn on secure-exec. Donโ€™t want it doing anything at all except replying in threads? Thatโ€™s a legitimate configuration.

This matters because the industry has discovered the hard way that โ€œclick approve on each actionโ€ doesnโ€™t scale. Developers rubber-stamp permission prompts reflexively, making manual review useless. The boundary has to be structural, set once at admin level and enforced by the runtime, not renegotiated on every request.

Layer 3: Scoped, authenticated, traceable tool access

Every assistant is bound to a single initiating user. When it runs, it mints a short-lived token scoped to that user. Think of it as a personal access token that exists for the duration of one task. All third-party calls, Slack, calendar, email, internal MCP tools, flow through this token. The userโ€™s credentials, approved connections, and permissions define the entire universe of what the assistant can do, and admin policy can narrow that further but never widen it.

Two things fall out of that design. First, every action the agent takes is authenticated as the user, which means every action is traceable back to them in the same audit logs a human employee would produce. Second, the agentโ€™s third-party access is gated by the workspaceโ€™s pre-configured tool sets. If the admin hasnโ€™t connected the calendar integration, the agent literally cannot check calendars. The capability isnโ€™t hidden. It doesnโ€™t exist in that runtime.

This is the MCP gateway pattern the security industry has been converging on. The โ€œlethal trifectaโ€ that makes agent exploits so dangerous, private data access plus external network routing plus untrusted execution, gets broken by funneling every external call through one logged, policy-enforced choke point. We piggy-back on the same observability primitives as the rest of our MCP infrastructure: the same search, the same alerting, the same log structure. A customer investigating โ€œwhat did the assistant do last Tuesdayโ€ uses the same tools they already use to investigate what any of their agents did.

Layer 4: The trigger model

This is the layer most agent platforms underweight, and itโ€™s where we think the biggest structural win lives.

OpenClaw treats ingress as undifferentiated. A heartbeat, a Slack DM, an email, a webhook, all just โ€œstuff that wakes the agent up.โ€ And because the agent is a non-deterministic model evaluating every message, anything that reaches its context is a potential prompt injection. Thatโ€™s how the original email-credentials exploit worked. The attacker didnโ€™t breach a server. They wrote an email, and the model decided to be helpful.

We split ingress and egress as first-class concepts. Ingress comes through named, pre-configured triggers: โ€œwhen someone DMs me on Slack,โ€ โ€œwhen this webhook fires,โ€ โ€œon this schedule.โ€ Between the trigger and the agent runtime sits a pluggable middleware layer, and this is the part that matters: untrusted input never touches the model until policy has run.

We ship default plugins for the obvious risks (prompt injection detection, secret exfiltration, SIEM emission to the customerโ€™s existing observability stack), but the pipeline is open. Customers ship their own plugins for company-specific policy: classifiers, enrichers, routing rules, content filters, whatever their security team decides matters. Suspicious payloads can be intercepted before any LLM evaluates them. Suspicious messages can be held and surfaced to the user for explicit approval rather than silently executed. Spike detection and source visibility (โ€œyouโ€™re getting hammered with weird-looking DMs from a domain youโ€™ve never seenโ€) become a security signal, not just an ops one.

This maps directly onto where the security research has landed. OWASP lists indirect prompt injection as the number-one risk for LLM applications. Microsoftโ€™s defense-in-depth guidance recommends isolating untrusted inputs before model invocation, because no model reliably distinguishes data from instructions, and none likely ever will. Weโ€™re implementing that pattern at the platform level so customers donโ€™t have to invent it themselves.

As a bonus, thread-aware response falls out of this design for free, a limitation of most current Slack bots, which treat each message in isolation.

Layer 5: Coordinated, not just spawned

The interesting consequence of the ephemeral runtime isnโ€™t isolation, itโ€™s what happens when multiple instances of the same assistant are working at once. Without coordination, parallelism gets you race conditions: two instances reading stale calendar state, two instances responding to overlapping Slack threads, two instances writing conflicting drafts.

Every assistant has a shared virtual filesystem that its instances coordinate through. Durable memory, scratchpads, summaries, recovery checkpoints, and resource leases all live there. Instances can see what their siblings are working on, claim a lease on a contested resource like a calendar or an inbox, and pick up from a checkpoint when an earlier instance dies mid-task.

Hereโ€™s what that looks like in practice. Say you ask your assistant โ€œwhenโ€™s my earliest meeting on Tuesday?โ€ At the same moment, an inbound email asks for time on your calendar. Two instances spin up. Instance A reads the calendar and writes a scratch note that itโ€™s evaluating Tuesday availability. Instance B claims a lease on the calendar resource and slots the email sender into a 9:30 opening. Instance A sees the lease, waits for it to clear, re-reads the now-authoritative state, and answers:

โ€œIt was your 10am 1:1 with John, but Linda just asked for time and I slotted her in from 9:30, so thatโ€™s your earliest meeting now.โ€

The model isnโ€™t doing anything clever there. The execution model is. Treating concurrency and shared state as first-class concerns is what makes a virtual employee feel like an employee instead of a chatbot.

Itโ€™s also the difference between graceful failure and silent corruption. If an instance dies halfway through a task, the next one resumes from the last durable checkpoint instead of either redoing the whole thing or quietly dropping the work.

Layer 6: Observability, not bolt-on

Every prompt, tool call, and response is captured end to end through the same primitives that handle the rest of the Speakeasy MCP platformโ€™s traffic. A customer investigating โ€œwhat did the assistant do last Tuesdayโ€ uses the same search, the same alerting, and the same log structure they already use for any other agent in their org. And because every action is authenticated as the initiating user, assistant activity lands in the same audit logs as the humans it acts for, with no custom correlation required.

This matters because observability is the feedback loop for everything else. Policy plugins are only useful if you can see what they caught. Identity scoping is only useful if you can trace what each token did. Ingress filters are only useful if you can measure whatโ€™s getting through. Customers also get SIEM and Datadog emission out of the ingress pipeline, so assistant telemetry flows into the same stack that watches the rest of their infrastructure, instead of living in a separate vendor dashboard nobody checks.

Most OpenClaw-for-enterprise offerings need to bolt observability onto the outside. We already own the tool plane, so the audit trail is a consequence of the architecture, not a feature we had to go build.

The stack as a whole

Layer
Ingress
OpenClaw default
Undifferentiated, model-evaluated
Our approach
Named triggers plus pluggable pre-model policy
Identity / egress
OpenClaw default
Persistent host credentials
Our approach
User-bound, short-lived per-invocation tokens, MCP-gated
Execution
OpenClaw default
Native shell plus filesystem
Our approach
, in-memory FS, opt-in capabilities
Runtime
OpenClaw default
Long-lived, stateful host
Our approach
Stateless ephemeral functions, scale-to-zero, governed concurrency
Coordination
OpenClaw default
Single process, no shared state
Our approach
Shared VFS with leases, journals, and recovery checkpoints
Observability
OpenClaw default
Bolt-on or third-party
Our approach
Inherited from the Speakeasy MCP platform, end-to-end

What weโ€™re not claiming

Weโ€™re not claiming โ€œcompletely secure.โ€ Thatโ€™s a bold word, and nobody deploying autonomous agents in 2026 should be using it with a straight face. Prompt injection is an unsolved problem. Researchers have shown that even sophisticated models fail the majority of the time under repeated attack. No single control fixes it.

What we are claiming is a structurally smaller blast radius, fewer assumptions about model judgment, and a platform where the controls the industry is still inventing have a natural place to plug in. When the next EchoLeak-class vulnerability drops, customers on our platform shouldnโ€™t need to rebuild their architecture to respond. They should be able to add a filter at the ingress layer, tighten a tool scope, or flip a capability off, and have that change propagate everywhere, immediately.

Security as the substrate, not the patch

Feature-first agent systems will keep losing to attackers, because the surface grows faster than the patches. Every new capability adds a new exploit path, and the defender is always one CVE behind. Security-first systems trade some open-ended capability for something more valuable to an enterprise: the ability to actually deploy, in production, under audit, without waking up to find that a crafted email drained the access tokens overnight.

For virtual employees to be useful at the scale we want them to be, the runtime has to fail closed by default, the ingress has to be inspectable before the model sees it, and the blast radius of a mistake has to be the task, not the company.

Thatโ€™s the bet weโ€™re making. So far, it looks like a good one.

Last updated on

AI everywhere.