AI & MCP

AI transformation is the new digital transformation. Your vendor stack isn't ready.

Thomas Rooney

February 24, 2026 - 14 min read

AI & MCP

Your team spent six months evaluating a new tool. You scored it on features, pricing, integrations, security posture, and support SLAs. You ran a proof of concept. You chose a winner.

Six months later, your AI platform team is trying to automate the workflow that tool supports. They can’t. The core operations require a human to log into a web UI, click through a wizard, and approve changes manually. The tool works great for humans. It’s a dead end for agents.

This is happening across the enterprise software stack right now — and the pattern should feel familiar.

For two decades, enterprises funded “Digital Transformation” programs. New CRM tools, analytics stacks, collaboration suites, workflow tooling. The specifics varied, but the pattern was the same: a mandate came from the top, budgets were allocated, and teams evaluated vendors against a new set of criteria that didn’t exist five years prior.

We’re at the front of the next wave. Call it AI Transformation.

We’re seeing AI Transformation teams emerge across the Fortune 500 — dedicated headcount, dedicated budget, executive sponsorship. And the demand is already creating new tooling categories: context management and retrieval layers, agent communication and coordination tooling, agent configuration and operations platforms, agent-to-API boundary tooling that handles auth, policy, and data segregation.

But while the demand has shifted, the vendor evaluation criteria haven’t caught up. Feature checklists. UI walkthroughs. Integration matrices. Pricing tiers. All still optimized for a world where humans are the operators.

The question these teams should be asking — and mostly aren’t — is: can agents operate this product?

Not “does the vendor use AI.” Not “is there a chatbot in the docs.” The question is whether the product’s fundamental operating model — how you configure it, run it, customize it, and scale it — is something an agent can own end-to-end.

Why this question matters now

LLMs are remarkably good at translating human intent into API calls. That’s useful on its own, but the real unlock is what it enables: engineering teams operating infrastructure at a scale that wasn’t possible when every workflow required a human in the loop — and, eventually, non-developers building software-like things to solve their specific domain problems.

Consider the concrete version. Your platform team maintains SDKs for internal APIs. An API team ships a change on Tuesday. The SDK needs to be regenerated, tested, and published before downstream consumers — human or agent — hit errors.

In the old world, that’s a ticket. An engineer picks it up, runs the generator, reviews the diff, fixes conflicts with any custom code, runs tests, and publishes. Maybe it takes a day. Maybe a week if the engineer is busy.

In an agent-operated world, that’s a pipeline. The agent detects the spec change, runs generation, resolves conflicts, validates tests, opens a PR, and publishes on approval. Minutes, not days. But only if the generation tool is something the agent can actually operate.

The blocker isn’t agent capability. Coding agents today can run CLIs, resolve git conflicts, write tests, and manage package publishing. The blocker is whether the tool was designed for that operating model — or whether it assumes a human is driving.

Now multiply that by every tool in your stack. Your SDK generator. Your CI/CD platform. Your observability tooling. Your infrastructure provisioning. Your documentation pipeline. Each one is either ready for agent operators, or it’s a bottleneck waiting to surface.

Three tiers of agent readiness

We’ve found it useful to think about products along a spectrum. Not every tool needs to be in the same tier, and different organizations will land in different places. But understanding where a product sits helps you predict how well it will fit the operating model you’re building toward.

Agent-powered

The vendor uses AI inside their product to improve outcomes. Maybe they use LLMs to generate better code, or ML to optimize their pipeline. You benefit, but passively.

The interaction model is still fundamentally human: you review what the service produces, you approve changes through their UI, you manage releases through their workflow. The AI is the vendor’s capability, not yours.

This works well when you want a managed service and don’t need to extend the workflow. The failure mode is lock-in on intelligence. When you want to apply similar patterns to adjacent problems — docs generation, compliance checks, multi-repo refactors — you can’t, because the AI lives inside their service. You bought an outcome, not a capability.

Agent-friendly

The product wasn’t built with agents in mind, but it doesn’t fight them either. It has a solid CLI, deterministic outputs, readable diffs, and reasonable APIs. Agents can wrap it.

Think of well-designed Unix tools. jq is incredibly agent-friendly: deterministic, scriptable, composable. But nobody designed jq anticipating that an LLM-powered agent would be the primary operator. It just has good engineering principles that happen to work.

The failure mode is integration burden. You build and maintain the agent loop yourself. The vendor isn’t investing in making the agent experience better over time — it just works because the underlying design is clean. When the CLI changes or the output format shifts, your automation breaks and you fix it.

Agent-native

The product is designed so that agents are the primary operators, not an afterthought. This applies to both sides: agents operating the production pipeline (generation, testing, publishing) and agents consuming the output (using the SDKs, MCP servers, and interfaces the tool produces).

The full workflow is automatable — whether that’s a CLI, a CI action, or an API. Customization happens through scriptable primitives — hooks, overlays, config files — that map directly to actions an agent can take. Conflicts surface as standard git markers, not proprietary resolution flows. The vendor provides building blocks that compose into agent-operated pipelines.

The operating model is “agent executes a pipeline,” not “human approves vendor PRs.”

The failure mode is ownership. You own the loop: policy gates, approval workflows, security scanning, release management. Agent-native doesn’t mean no governance. It means governance-as-code. That requires internal platform maturity.

What this isn’t

This isn’t a value judgment. Agent-powered products can deliver excellent outcomes. Agent-friendly tools can be the backbone of reliable infrastructure. The taxonomy is a planning tool: it helps you match products to the operating model your organization is building toward.

Most enterprises are somewhere in between — and moving toward the agent-native end of the spectrum faster than their procurement processes reflect.

When configuration costs approach zero

Coding agents are making routine software maintenance effectively free. Writing a config file. Setting up a CI pipeline. Resolving a merge conflict. Updating a dependency. Fixing a type error after a regeneration. These are tasks agents handle reliably today.

That changes what matters when you’re picking a vendor.

The old question: “How easy is this for our engineers to set up and maintain?” This favored products with polished UIs, guided workflows, and managed services. Configuration difficulty was a real cost, and products that reduced it won deals.

The new question: “Can an agent operate this product end-to-end without a human in the loop?” This favors products with deterministic CLIs, local execution, scriptable customization, and standard interfaces. The difficulty of configuration is no longer the differentiator — agents handle that. The differentiator is whether the product’s operating model supports autonomous operation at all.

In practice, two products might have equivalent capabilities. They might even use similar underlying mechanisms — three-way merges for custom code preservation, git-native conflict markers, reproducible generation. The mechanism is roughly equivalent.

What differs is the operating model. Who initiates the loop? Where does state live? What’s the “unit of work” for the operator? If the operator is an agent, these questions have different optimal answers than if the operator is a human clicking “approve” in a web UI.

What agent-native looks like concretely

When evaluating whether a product is agent-native, these are the properties that matter:

Property	What to look for	Why it matters for agents
Automation-first execution	Full workflow runs from CLI, CI, or API — no browser required	Agents operate in terminals and pipelines, not web UIs
Deterministic output	Same inputs + version pins = same output, every time	Agents need predictable behavior to validate results
Scriptable customization	Hooks, overlays, config files — not UI wizards	Agents can read, modify, and test config files programmatically
Pipeline-composable	Fits into CI/CD as a build step, chainable with other tools	Agents can own the full chain: generate, test, publish
Loop throughput	Fast enough that agents can run tight loops across many repos	A 30-second tool call iterated 200 times is hours. A 2-second run is minutes (see benchmark context).
Local data	Specs and artifacts stay inside your security boundary	No sensitive API definitions leaving your perimeter

A product that hits most of these can be operated by an agent as reliably as by a human — and at a scale no human team can match.

The platform this enables

Agent-native products aren’t just individually better for automation. They enable a platform architecture where agents operate the entire software supply chain.

An enterprise with hundreds of internal APIs needs four layers to make this work:

Layer	Role
OpenAPI	The contract. Machine-readable source of truth for what each API does. Versionable, diffable, lintable.
Generated artifacts	The execution layer. SDKs, Terraform providers, MCP servers, CLI tools, docs — all generated from the same API contract. Each makes the API accessible in a different context.
Catalog / MCP	The discovery layer. Agents query what APIs exist, what SDKs are available, what capabilities the org exposes. Turns a sprawling internal landscape into something navigable.
Agents	The operators. They generate artifacts from specs, keep everything in sync, resolve conflicts, run tests, publish packages, and update consuming code.

Without this, each agent rebuilds API plumbing from scratch every time. With it, agents consume stable typed interfaces and focus on business logic. That difference compounds fast.

”But can’t agents just write the API calls directly?”

An agent can read an OpenAPI spec, construct HTTP requests, and parse responses. It works. Once. In one language. For one agent. Right now.

The problems show up the second time. Every invocation is a fresh generation. The agent writes different error handling, different retry logic, different pagination. There’s no stable baseline, so when something changes — and it will — you can’t diff “what the agent wrote last time” against “what it wrote this time.” You can’t review what you can’t compare.

Then your API evolves. A backend team adds a new enum value behind a feature flag on Tuesday. A field becomes nullable. A response shape changes. A generated SDK encodes decisions about how to handle the unknown — accept unexpected fields, degrade gracefully on new values, don’t crash on schema drift (strict in development, lax in production). Those decisions are made once, tested once, and applied everywhere. An agent writing raw integration code makes them ad hoc, differently each time, and often wrong in ways that surface at 2am.

Now multiply across languages. The same API consumed from Python, Go, TypeScript, and Java. Agent-written code will diverge. Different timeout defaults. Different retry strategies. Different type mappings. That divergence becomes a class of bugs you can barely even diagnose.

The pattern is the same one that made infrastructure-as-code win over shell scripts. An agent can do it. Doing it reliably across an organization means encoding those decisions once in a deterministic system, not hoping the agent makes the same choices next time.

The second-order effect: non-developers become builders

The biggest upside isn’t “developers code faster.” It’s that non-technical teams can build software-like workflows against internal APIs.

When internal APIs are well-described and agent-native tooling can generate typed interfaces automatically, you unlock something bigger than engineering efficiency. A compliance team querying employee data through a generated interface. An operations team building dashboards from internal APIs without filing engineering tickets. A support team automating workflows across services they don’t have to understand at the protocol level.

The concrete forms are already emerging: spreadsheet plugins backed by internal APIs, admin tools generated from API contracts, chat and ticketing integrations for operational actions. These aren’t hypothetical — they’re the natural consequence of making APIs discoverable and consumable through agent-native tooling.

Once organizations see non-engineering productivity gains, tooling decisions move higher in the org and budgets get larger. But only if the generation pipeline itself is agent-native. If producing a new SDK or MCP server requires a human to log into a web platform and click through an approval flow, you’ve bottlenecked the entire system on human availability.

The honest trade-offs

Agent-native isn’t free. Each tier has real costs.

Agent-powered products let you outsource complexity. The vendor handles the hard parts. But when you need to extend the workflow — apply the same generation patterns to internal tooling, integrate with your observability stack, enforce org-specific conventions — you’re constrained by what the vendor exposes. You bought an outcome. You didn’t build a capability.

Agent-friendly products give you flexibility without lock-in. But you’re building and maintaining the automation layer yourself. When the product’s interface changes, you absorb the maintenance cost.

Agent-native products give you full control and composability. But you accept the responsibility of operating the pipeline. Policy-as-code, approval gates, security scanning, release management — these become your platform team’s domain. Agent-native doesn’t mean lower complexity or fewer engineers. It means your engineers build for a much wider consumer base: huge numbers of agent operators supported by fewer human operators.

How to evaluate your stack

Run this during your next procurement cycle, or retroactively against your existing stack.

Can an agent execute the full workflow without a browser? If any critical step requires a human in a web UI, that step is a bottleneck. CLI and API automation should be complete, not partial.

Is behavior deterministic and reproducible? Given the same inputs and version pins, does the system produce the same result every time? If not, agents cannot validate or safely automate it.

Where do customizations live? In agent-actionable assets (config, code, policy files), or in UI toggles and manual approval paths? If customization is not machine-operable, it will not scale.

Can this run consistently across teams, projects, and environments? If each team needs bespoke setup or ongoing manual intervention, the operating model will not scale.

What is the failure and conflict surface? When automation collides with existing behavior, does it fail in standard, diagnosable ways (diffs, logs, merge conflicts), or through proprietary black-box flows?

Can we observe and audit agent actions end-to-end? Every action should be attributable, reviewable, and reversible with clear logs and policy context.

Does data stay inside your boundary? Can sensitive configs, business data, and operational metadata remain inside enterprise controls with explicit egress paths?

Prove it with a POC

Debates are cheap. Measured loops are not.

Pick one representative workflow and run an adversarial loop:

Have an agent execute the full workflow end-to-end.
Add realistic production customizations on top.
Change the inputs in ways that collide with those customizations.
Re-run, publish, and measure what needed human help.

Track automation rate, conflict count, time-to-deployed-artifact, and rollback speed. Two weeks of this produces more signal than months of feature comparison. You get operational truth, not demo truth.

Our bet

If you ran the scorecard above against our own product, you’d see why we wrote this post.

We’ve been thinking about this future at Speakeasy for a while. We wrote internal strategy docs in 2024 predicting that AI Transformation teams would emerge across the Fortune 500, that enterprise MCP servers would become a real category, and that the market would shift toward tooling designed for agent operators. Those predictions have largely come true.

We built Speakeasy around the thesis that agents will be the primary operators of the API developer experience lifecycle.

CLI-first, local-first generation. Hooks, overlays, and skills as composable primitives. Deterministic, reproducible output. The fastest API tooling we can build, because agent loops across hundreds of repos punish every wasted millisecond. And not just SDKs — Terraform providers, MCP servers, CLIs, documentation — the entire developer experience surface generated from your API contract.

We believe the next wave of enterprise software evaluation will look fundamentally different from the last one. Not feature checklists. Not UI walkthroughs. The questions that matter: do we own the pipeline, or does the vendor? Can we scale it without human bottlenecks? Can agents operate it safely?

Teams that evaluate vendors through this lens will build compounding internal capability. Teams that don’t will keep buying isolated tools and calling it transformation.

Last updated on February 24, 2026