There’s a debate happening in the AI tooling world right now: should you use skills or MCP servers to extend your agents?
Block’s Goose team published a post called “Did Skills Kill MCP?” Armin Ronacher (the creator of Flask, now building Earendil ) wrote about preferring skills over MCP for daily coding work. LlamaIndex on the other hand tested both approaches and found that their documentation MCP server made skills mostly redundant. That’s a use case where MCP naturally subsumes skills, and it doesn’t tell you much about workflows, governance, or anything beyond documentation lookup.
Everyone’s picking sides. I think they’re asking the wrong question.
After spending the past year building MCP infrastructure, I’ve landed on a middle ground: skills and MCP servers aren’t competing, they’re different layers of the same architecture. And if you’re building an AI tooling strategy for your org, you need both.
The core distinction
Skills teach agents how to do things. MCP servers give agents the ability to do things.
A skill is a static artifact, a markdown file, a set of instructions, a curated pattern. It tells the agent: “When you encounter this type of problem, here’s the approach.” An MCP server is a live application, it has auth, infrastructure behind it, and returns real-time data.
Block’s Goose team put it well: “Skills describe the workflow. MCP provides the runner.” The YAML doesn’t execute anything, the runner does.
Why this matters for engineering leaders
If you’re a VP of Eng or CTO thinking about how to roll out AI tooling across your org, the skills-vs-MCP framing leads you to a bad place. You end up trying to choose one approach, and either:
You go all-in on skills, and your agents have great playbooks but no live access to your services. They know how to query your API but can’t actually authenticate and call it.
You go all-in on MCP, and you end up with a sprawl of MCP servers, many of which are just bad API wrappers that shouldn’t exist. Armin Ronacher has noted this : “many MCP servers don’t need to exist. They’re either bad API wrappers or they’re actually truly replaceable by a skill.”
The right answer is a two-layer architecture where each layer does what it’s good at.
Layer 1: Skills for curated execution patterns
Skills shine when you have complex, nuanced workflows that your team has refined over time. The kind of institutional knowledge that lives in a senior engineer’s head.
Think about things like:
How your team does code review (what to check, in what order, what’s a blocker vs. a nit)
The specific pattern for deploying to your staging environment
How to interpret and act on data from a particular internal dashboard
The approved workflow for handling a customer escalation
These are patterns of doing something. They’re curated. They’re opinionated. And they’re mostly static — they don’t need to hit a live service to be valuable.
---name: triage-production-incidentdescription: > Use this skill when triaging a production incident. Guides the agent through the team's runbook.---# Production incident triage## Step 1: Assess severity- Check the #alerts channel for context- Query the metrics dashboard for error rate and p99 latency- Classify as P0 (customer-facing outage), P1 (degraded), or P2 (internal)## Step 2: Notify- P0: Page the on-call and post in #incidents- P1/P2: Post in #incidents with severity tag## Step 3: Investigate- Pull recent deploys from the deployment tool- Check logs for the affected service- Correlate with upstream dependency status
This is just a markdown file. No schema, no server, no auth. The agent reads it and knows exactly how your team handles incidents.
Skills work well here because:
They’re lightweight (just prompt content, no schema overhead)
They’re version-controlled like any other code artifact
They encode domain expertise that an LLM wouldn’t otherwise have
They can reference tools (including MCP tools) without being tools themselves
The key insight: a skill is a pattern of doing something, and doing it over and over creates a skill. The name is actually more descriptive than people give it credit for.
Layer 2: MCP servers for live service access
MCP servers are the right choice when you need to interact with live systems. Any time there’s real-time data, authentication, or a need for centralized governance, MCP is the answer.
Here’s what an MCP tool definition looks like for the deployment tool that the skill above references:
tools: - name: list_recent_deploys description: List recent deployments for a service inputSchema: type: object properties: service: type: string description: The service name limit: type: integer default: 5 required: [service]
This isn’t a playbook — it’s a live connection. The agent calls list_recent_deploys, authenticates through the MCP server, and gets back real data from your deployment system. The skill told the agent when and why to check recent deploys. The MCP tool gives it the ability to actually do it.
MCP servers work better for live access because:
Auth: You can authorize users the same way you authorize access to your app. Clear identity, clear permissions.
Observability: An agent making a tool call is an external call. You can log it, monitor it, rate-limit it, and alert on it — using your existing stack (Datadog, etc.).
Governance: Differentiating between skill use and a user-defined prompt requires insight all the way down to the local execution environment. An MCP tool call is explicit and auditable.
Real-time data: Skills are beholden to whatever was written when the file was created. MCP servers can put a vector store, a RAG pipeline, or a live database behind the tool.
Here’s the governance point that I think gets underappreciated: a skill runs locally. An MCP server is a central service.
If you’re an engineering leader who cares about understanding what agents are doing across your org — who’s calling what, how often, with what data — you need those interactions flowing through infrastructure you can observe. Skills, by their nature, are invisible to your central systems.
The decision framework
When we advise companies on this, we use a simple heuristic:
Skills vs. MCP: when to use which
Criterion
What you're providing
Skill
A pattern or process
MCP server
Access to a live service
Content nature
Skill
Static, curated by your team
MCP server
Real-time data
Auth needed
Skill
No
MCP server
Yes
Monitoring/audit
Skill
Not observable centrally
MCP server
Logged, rate-limited, auditable
Primary users
Skill
Technical (local dev environment)
MCP server
Technical and non-technical
Infrastructure
Skill
None (just a file)
MCP server
Hosted service
Criterion
Skill
MCP server
What you're providing
A pattern or process
Access to a live service
Content nature
Static, curated by your team
Real-time data
Auth needed
No
Yes
Monitoring/audit
Not observable centrally
Logged, rate-limited, auditable
Primary users
Technical (local dev environment)
Technical and non-technical
Infrastructure
None (just a file)
Hosted service
Use both when a skill describes the workflow and an MCP server provides the tools the workflow references. The incident triage example above shows this: the skill defines the process, and MCP tools like list_recent_deploys give the agent the ability to execute each step.
”What about skills + code mode though?”
There’s a counterargument to this framework that’s worth addressing head-on: skills + a code sandbox = MCP.
The logic goes like this: modern agents like Claude Code and Goose have code execution built in. If you give an agent a skill that describes an API (the endpoints, the auth pattern, the data model) and the agent can write and execute code, then it can just curl the API directly. No MCP server needed. Why maintain a separate hosted service when the agent can do the same thing from a script?
For an individual developer doing ad-hoc work, this is roughly true. And it’s the strongest version of the argument: many MCP servers really are thin wrappers around APIs that a skilled agent with code execution could call directly.
But “roughly true for one developer” and “true for your organization” are very different claims.
Credentials have to live somewhere. In the sandbox model, the agent needs raw API keys and tokens to write into the code it executes. It has to know your secrets to use them. An MCP server handles auth server-side. The agent makes a structured tool call, and the server authenticates on its behalf. The agent never sees the credentials. At org scale, the difference between “every agent has access to raw secrets” and “no agent ever touches a credential” is the difference between a security model and a security incident.
Arbitrary code is a black box. When an agent makes an MCP tool call, you know exactly what happened: which tool, what inputs, what outputs, who initiated it. When an agent writes and executes a Python script that makes HTTP requests inside a sandbox, you see “agent ran code.” You’d have to parse the generated code to reconstruct what services it touched, with what data, and how often. Your existing observability stack (Datadog, etc.) can monitor MCP tool calls the same way it monitors any API. Monitoring arbitrary sandbox code requires a fundamentally different approach.
Structured tools constrain the blast radius. An MCP tool has a defined schema: specific inputs, specific outputs, specific capabilities. A code sandbox with network access can do anything. If a prompt injection convinces your agent to exfiltrate data, the MCP layer limits what data the agent can access through well-defined tool boundaries. A sandbox with raw credentials and network access has no such constraints.
Centralized updates vs. distributed drift. When an API changes, you update one MCP server and every agent across your org gets the new behavior immediately. With skills + sandbox, you’re relying on every developer’s skill file getting updated, and on the agent generating correct code against the new contract. At scale, this becomes the same configuration drift problem that infrastructure-as-code was designed to solve.
The honest version of the equation is: skills + code sandbox ≈ MCP for a single developer’s workflow. But the gap between “approximately equivalent” and “actually equivalent” is exactly where security, observability, and organizational governance live. The sandbox gives you capability. MCP gives you capability with guardrails.
What this looks like in practice
Say you’re the CTO of a mid-size SaaS company with a dozen internal services.
Your MCP layer: Curate a set of MCP servers — roughly one per service or per use case. Your CRM tools, your deployment tools, your analytics queries. These are hosted centrally, monitored through your existing observability stack, and accessible to both technical and non-technical teams via whatever MCP client they prefer.
Your skills layer: Build skills for the complex, opinionated workflows that your team has developed. How to triage a production incident using your specific runbooks. How to prepare a customer QBR by pulling from the right data sources in the right order. How to onboard a new engineer to your codebase.
The MCP servers give agents access. The skills give agents judgment.
The bigger picture
The skills-vs-MCP debate mirrors a pattern we’ve seen before in software. It’s like arguing whether you need documentation or APIs. Obviously, you need both. Documentation tells you how to think about a system. APIs give you the ability to interact with it.
What’s new is that both layers are now consumed by the same entity — an AI agent — rather than a human. And that means the architecture of how you deliver these capabilities to agents matters in ways it didn’t before. Token budgets, schema overhead, context window management — these are real constraints that should inform which layer you use for what.
The fundamental architecture isn’t complicated: teach agents how to think about your domain (skills), and give them the tools to act on it (MCP). The companies getting the most out of AI tooling aren’t choosing between these layers — they’re building both.
But it’s worth being honest about the trajectory. Skills exist today because agents aren’t yet good enough at consuming raw documentation. A 40-page API reference doesn’t fit in a context window, and even if it did, an LLM can’t reliably extract the implicit knowledge — the “how your team actually does this” — from technical prose written for humans. Skills bridge that gap by pre-digesting expertise into a form agents can act on reliably.
That gap is closing. Context windows are expanding. Agents are getting better at reasoning over large documents. RAG pipelines are improving. There’s a version of the future — maybe not distant — where agents can consume your product documentation, your runbooks, your architectural decision records directly, and infer the workflows themselves. In that world, the line between “skill” and “well-written documentation” disappears. The documentation is the skill.
This doesn’t change the architecture described here — it just means the skills layer eventually converges with how you write docs. And if anything, it makes the MCP layer more important, not less. MCP was never about documentation. It’s about live access, authentication, and governance — things that no amount of better documentation can replace. Even in a world where agents consume docs natively, they still need structured, governed access to your running systems.
At Speakeasy, we build tools for both layers. Gram is the fastest way to build and deploy production MCP servers, and we’ve released a collection of agent skills for OpenAPI and SDK workflows. If you’re building out your AI tooling strategy, we’d love to help .