AI & MCP
Security first, capability second: Building virtual employees enterprises can actually trust
Nolan Sullivan
April 23, 2026 - 12 min read
Capability vs. security, the fundamental tension in AI assistants
The value of an AI assistant comes from its ability to act on a userโs behalf without being directed. That is also what makes it dangerous.
Every AI assistant platform sits on that tension: the more independently an assistant can act, the more damage it can do. These two axes pull against each other, and the debate in the agent community has become surprisingly heated. One camp argues that a maximally capable agent, with unfettered shell access, persistent filesystems, and the ability to install whatever it needs on the fly, is the only way to deliver on the promise of truly autonomous AI. The other argues that capability without containment is a breach waiting to happen.
Both sides are right, within their context.
This isnโt a new problem. Itโs the same tension enterprises and startups have always navigated with their human employees. At a 20-person startup, the new hire gets admin on everything day one. Velocity beats governance because the blast radius is small and the cost of a missed opportunity is existential. At a bank, the same hire waits three weeks for a laptop, signs four policies, and gets read-only access to staging. The blast radius is enormous, the regulators are watching, and a single compromised account can end careers.
Neither approach is wrong. Theyโre answers to different questions. The same is true for AI assistants: a solo founder experimenting with an autonomous agent has a genuinely different threat model than a 10,000-person enterprise with regulatory responsibilities.
Technologies like OpenClaw are amazing for solo developers and indie hackers, and they deserve to keep being exactly that. But we felt the enterprise audience was being underserved, and thatโs the gap we set out to fill.
Built-in vs. bolted on
When we started experimenting with building a secure AI assistant, we started with OpenClaw. Our first instinct was to bolt a security layer on top, similar to NemoClawโs approachย . But we ultimately decided that for an AI assistant to really be ready for use inside an enterprise, security had to be built into the foundations, not retrofitted after the fact.
We also had a head start most teams in this space donโt. Speakeasy already gives us the primitives an enterprise virtual employee actually needs: hosted MCP servers with a governed tool catalogue, environments and encrypted secrets, RBAC, audit logs, and telemetry. We didnโt have to invent any of that. Virtual employees inherit it. And because the stack is auditable end to end, down to the well-known open source dependencies it relies on, security teams can verify the controls rather than take them on faith.
The good news is that we found choosing security-first doesnโt mean choosing capability-poor. The architecture below is more restrictive than OpenClaw in almost every dimension, but it can still do almost everything OpenClaw can do. Draft emails, summarize fifty PDFs, respond to Slack threads, hit third-party APIs, maintain memory across conversations. The difference is in how those capabilities are delivered: through layers that fail closed, with authentication that traces back to real users, and with an ingress pipeline that can hold a suspicious message before any model sees it.
Six layers, each a deliberate inversion of an OpenClaw default
Each layer inverts an OpenClaw default
Layer 1: Ephemeral by default
Every assistant invocation runs as a stateless function, think Lambda. Each gets its own clean environment and dies when the task is done. Parallelism scales horizontally but stays governed: users set a per-assistant cap, admins set an org-wide ceiling, and effective concurrency is whichever limit is smaller. Thereโs no persistent filesystem that an agent can scribble on, no accumulated state for a malicious skill to hide in, no installed binaries that activate themselves three weeks later.
Conversational continuity is handled through a warm pool for low-latency follow-ups and a separately managed memory layer for state that should outlive a single task. The same instance can stay live for a short window to handle follow-ups, but the runtime itself never persists. If the agent needs to remember something across conversations, it does so through an explicit, audited memory tool, not as a side effect of having had a computer.
This matters because persistence is where half of OpenClawโs real-world incidents live. A skill grabs a binary off the internet, installs it, scans credentials, and phones home, and the user doesnโt notice until the damage is weeks old. The NVIDIA AI Red Teamโs guidance on agent sandboxing makes this point explicitly: ephemeral lifecycle is one of the most important structural defenses a platform can offer. The industry is converging on the same answer.
Layer 2: secure-exec instead of shell access
The agent has no native shell. No screenshot tool. No ability to reach into the host filesystem, install arbitrary software, or spawn processes it wasnโt granted.
What it has instead is a tool called secure-exec, a security-first Node.js runtime. Arbitrary computation runs here, inside a confined environment with an in-memory filesystem that vanishes with the process. Need to download fifty PDFs and compile data from them? That works, because the capability exists, just not the permanence.
Critically, capabilities are opt-in per workspace and assignable per role. A fresh install ships as a message bot: it can respond, nothing more. Admins enable tool categories as their security posture allows, and they can scope individual toolsets to specific roles, so the assistant the support team uses doesnโt share an attack surface with the one finance uses. Want your assistant to read calendars? Turn on the calendar toolset. Want it to run code? Turn on secure-exec. Donโt want it doing anything at all except replying in threads? Thatโs a legitimate configuration.
This matters because the industry has discovered the hard way that โclick approve on each actionโ doesnโt scale. Developers rubber-stamp permission prompts reflexively, making manual review useless. The boundary has to be structural, set once at admin level and enforced by the runtime, not renegotiated on every request.
Layer 3: Scoped, authenticated, traceable tool access
Every assistant is bound to a single initiating user. When it runs, it mints a short-lived token scoped to that user. Think of it as a personal access token that exists for the duration of one task. All third-party calls, Slack, calendar, email, internal MCP tools, flow through this token. The userโs credentials, approved connections, and permissions define the entire universe of what the assistant can do, and admin policy can narrow that further but never widen it.
Two things fall out of that design. First, every action the agent takes is authenticated as the user, which means every action is traceable back to them in the same audit logs a human employee would produce. Second, the agentโs third-party access is gated by the workspaceโs pre-configured tool sets. If the admin hasnโt connected the calendar integration, the agent literally cannot check calendars. The capability isnโt hidden. It doesnโt exist in that runtime.
This is the MCP gateway pattern the security industry has been converging on. The โlethal trifectaโ that makes agent exploits so dangerous, private data access plus external network routing plus untrusted execution, gets broken by funneling every external call through one logged, policy-enforced choke point. We piggy-back on the same observability primitives as the rest of our MCP infrastructure: the same search, the same alerting, the same log structure. A customer investigating โwhat did the assistant do last Tuesdayโ uses the same tools they already use to investigate what any of their agents did.
Layer 4: The trigger model
This is the layer most agent platforms underweight, and itโs where we think the biggest structural win lives.
OpenClaw treats ingress as undifferentiated. A heartbeat, a Slack DM, an email, a webhook, all just โstuff that wakes the agent up.โ And because the agent is a non-deterministic model evaluating every message, anything that reaches its context is a potential prompt injection. Thatโs how the original email-credentials exploit worked. The attacker didnโt breach a server. They wrote an email, and the model decided to be helpful.
We split ingress and egress as first-class concepts. Ingress comes through named, pre-configured triggers: โwhen someone DMs me on Slack,โ โwhen this webhook fires,โ โon this schedule.โ Between the trigger and the agent runtime sits a pluggable middleware layer, and this is the part that matters: untrusted input never touches the model until policy has run.
We ship default plugins for the obvious risks (prompt injection detection, secret exfiltration, SIEM emission to the customerโs existing observability stack), but the pipeline is open. Customers ship their own plugins for company-specific policy: classifiers, enrichers, routing rules, content filters, whatever their security team decides matters. Suspicious payloads can be intercepted before any LLM evaluates them. Suspicious messages can be held and surfaced to the user for explicit approval rather than silently executed. Spike detection and source visibility (โyouโre getting hammered with weird-looking DMs from a domain youโve never seenโ) become a security signal, not just an ops one.
This maps directly onto where the security research has landed. OWASP lists indirect prompt injection as the number-one risk for LLM applications. Microsoftโs defense-in-depth guidance recommends isolating untrusted inputs before model invocation, because no model reliably distinguishes data from instructions, and none likely ever will. Weโre implementing that pattern at the platform level so customers donโt have to invent it themselves.
As a bonus, thread-aware response falls out of this design for free, a limitation of most current Slack bots, which treat each message in isolation.
Layer 5: Coordinated, not just spawned
The interesting consequence of the ephemeral runtime isnโt isolation, itโs what happens when multiple instances of the same assistant are working at once. Without coordination, parallelism gets you race conditions: two instances reading stale calendar state, two instances responding to overlapping Slack threads, two instances writing conflicting drafts.
Every assistant has a shared virtual filesystem that its instances coordinate through. Durable memory, scratchpads, summaries, recovery checkpoints, and resource leases all live there. Instances can see what their siblings are working on, claim a lease on a contested resource like a calendar or an inbox, and pick up from a checkpoint when an earlier instance dies mid-task.
Hereโs what that looks like in practice. Say you ask your assistant โwhenโs my earliest meeting on Tuesday?โ At the same moment, an inbound email asks for time on your calendar. Two instances spin up. Instance A reads the calendar and writes a scratch note that itโs evaluating Tuesday availability. Instance B claims a lease on the calendar resource and slots the email sender into a 9:30 opening. Instance A sees the lease, waits for it to clear, re-reads the now-authoritative state, and answers:
โIt was your 10am 1:1 with John, but Linda just asked for time and I slotted her in from 9:30, so thatโs your earliest meeting now.โ
The model isnโt doing anything clever there. The execution model is. Treating concurrency and shared state as first-class concerns is what makes a virtual employee feel like an employee instead of a chatbot.
Itโs also the difference between graceful failure and silent corruption. If an instance dies halfway through a task, the next one resumes from the last durable checkpoint instead of either redoing the whole thing or quietly dropping the work.
Layer 6: Observability, not bolt-on
Every prompt, tool call, and response is captured end to end through the same primitives that handle the rest of the Speakeasy MCP platformโs traffic. A customer investigating โwhat did the assistant do last Tuesdayโ uses the same search, the same alerting, and the same log structure they already use for any other agent in their org. And because every action is authenticated as the initiating user, assistant activity lands in the same audit logs as the humans it acts for, with no custom correlation required.
This matters because observability is the feedback loop for everything else. Policy plugins are only useful if you can see what they caught. Identity scoping is only useful if you can trace what each token did. Ingress filters are only useful if you can measure whatโs getting through. Customers also get SIEM and Datadog emission out of the ingress pipeline, so assistant telemetry flows into the same stack that watches the rest of their infrastructure, instead of living in a separate vendor dashboard nobody checks.
Most OpenClaw-for-enterprise offerings need to bolt observability onto the outside. We already own the tool plane, so the audit trail is a consequence of the architecture, not a feature we had to go build.
The stack as a whole
What weโre not claiming
Weโre not claiming โcompletely secure.โ Thatโs a bold word, and nobody deploying autonomous agents in 2026 should be using it with a straight face. Prompt injection is an unsolved problem. Researchers have shown that even sophisticated models fail the majority of the time under repeated attack. No single control fixes it.
What we are claiming is a structurally smaller blast radius, fewer assumptions about model judgment, and a platform where the controls the industry is still inventing have a natural place to plug in. When the next EchoLeak-class vulnerability drops, customers on our platform shouldnโt need to rebuild their architecture to respond. They should be able to add a filter at the ingress layer, tighten a tool scope, or flip a capability off, and have that change propagate everywhere, immediately.
Security as the substrate, not the patch
Feature-first agent systems will keep losing to attackers, because the surface grows faster than the patches. Every new capability adds a new exploit path, and the defender is always one CVE behind. Security-first systems trade some open-ended capability for something more valuable to an enterprise: the ability to actually deploy, in production, under audit, without waking up to find that a crafted email drained the access tokens overnight.
For virtual employees to be useful at the scale we want them to be, the runtime has to fail closed by default, the ingress has to be inspectable before the model sees it, and the blast radius of a mistake has to be the task, not the company.
Thatโs the bet weโre making. So far, it looks like a good one.