AI & MCP
The AI subsidy is ending, and waste is about to become a line item
Cameron McClellan
June 12, 2026 - 11 min read

Both frontier labs filed for IPOs this month, and their pricing pages had already announced what the prospectuses will confirm: the subsidized era of AI is over. Anthropic submitted a confidential draft S-1 on June 1 , OpenAI followed on June 8 , and companies preparing to face public markets price their products to make money, not to win adoption.
The repricing is already starting. OpenAI’s newest flagship costs exactly twice its predecessor per token . So does Anthropic’s . GitHub Copilot retired flat-rate premium requests entirely this month, and the multiplier on Opus-class models went from 7.5x to 27x between March and June. Enterprises that planned 2026 budgets on 2024 economics are discovering the difference one invoice at a time.
For three years, subsidized compute has meant that companies didn’t need to distibuish between AI spend that produced value and AI spend that produced nothing. An agent that burned 100,000 tokens re-reading the same context, or loaded 142 tool definitions to use six of them, cost roughly what a well-run one did. Metered honestly, those two agents have very different price tags, and the gap between them is about to show up as a line item. The companies that keep scaling AI through this shift will be the ones that can see that line item and act on it, which is an infrastructure problem before it is a budgeting one.
Why are AI prices rising?
The prices that are rising are the ones enterprises actually pay: frontier models, premium capacity, and the formerly flat-rate plans that wrapped them.
Frontier list prices doubled in a generation
For the first time since the GPT-4 era, successor models launched at higher prices than the models they replaced:
- GPT-5.5 launched at $5 per million input tokens and $30 per million output tokens , exactly 2x GPT-5.4’s $2.50 and $15. GPT-5.5-Pro sits at $30 and $180, the most expensive mainstream frontier API on the market.
- Claude Fable 5 launched at $10 and $50 , 2x its predecessor Opus 4.8 at $5 and $25.
- Premium paths multiplied on top of list price: Anthropic’s fast mode runs up to $30 and $150 (a 6x premium over standard Opus), OpenAI charges 5x for priority processing, and both bill surcharges for long context and data residency.
There are quieter increases too. Anthropic’s own pricing documentation notes that the tokenizer introduced with Opus 4.7 “may use up to 35% more tokens for the same fixed text” . The rate card stays flat while the cost per task rises.
Flat-rate AI plans are disappearing
Every major AI coding vendor abandoned all-you-can-eat pricing within roughly 12 months:
- GitHub Copilot moved to usage-based billing on June 1, 2026, replacing flat premium requests with token-metered AI Credits. The model multipliers tell the story in one table: Opus-class models launched at a promotional 7.5x, moved to 15x in May, and sit at 27x today.
- Cursor replaced its 500 fast requests per month with a usage allowance tied to actual API costs in June 2025, citing the rising cost of frontier models.
- Replit moved its agent to effort-based pricing , where a task’s price scales with the compute it consumes.
- Anthropic added weekly rate limits to Claude subscriptions in August 2025 after some users ran Claude Code continuously, then began throttling more aggressively during peak hours in March 2026.
Each of these vendors sat between an enterprise and a frontier lab, absorbing variable token costs under a fixed subscription. That position became unsustainable for all of them in the same year, which is what the end of a subsidy looks like from downstream.
The labs can’t subsidize usage forever
The economics behind the repricing are stark even through the keyhole of public filings. Microsoft’s quarterly SEC filing implied that OpenAI lost on the order of $10 billion in the quarter ending September 2025 , based on Microsoft’s share of those losses. Anthropic, for its part, is reported to be approaching its first profitable quarter heading into its listing, which is what investor scrutiny does to pricing discipline. As Delphi Labs’ Kevin Simback put it in AFP’s reporting on soaring enterprise AI bills , the industry is exiting its era of “subsidized intelligence.”
If tokens keep getting cheaper, why is the AI bill going up?
The paradox at the center of AI budgeting is that both of these are true at once: per-token prices have collapsed, and total bills have never been higher.
The deflation is real and well measured. Epoch AI found that the price to reach a fixed capability level falls between 9x and 900x per year depending on the task, with a median around 50x . Andreessen Horowitz measured a 1,000x decline in the cost of GPT-3-level output over three years . A token of 2024-quality intelligence is nearly free in 2026.
Bills rose anyway. Menlo Ventures’ survey of enterprise buyers found model API spend hit $8.4 billion in the first half of 2025, more than doubling in six months. Gartner raised its 2026 worldwide AI spending forecast to $2.59 trillion , up 47% in a year. Gartner analyst Will Sommer resolved the paradox in one sentence: token costs are coming down, but the applications they unlock “are going to be more expensive, not less” .
Three things are rising fast enough to swamp the per-token deflation:
- The frontier premium. Workloads migrate to the newest, most capable model, and that model now launches at 2x the old one. The deflation applies to last year’s capability, not this year’s.
- Token volume. Inference is now the dominant AI workload. Menlo found 74% of startups and 49% of enterprises run majority-inference workloads , and Deloitte estimates inference will be roughly two-thirds of AI compute in 2026 , up from one-third in 2023. Derek Thompson reports that average business token consumption grew 13x between January 2025 and May 2026 .
- The end of flat rate. Costs that vendors used to absorb under subscriptions now pass through to the customer, metered.
Falling token prices were never going to save a budget from those three forces. The only durable lever is consuming fewer wasted tokens, which requires knowing which tokens are wasted.
Where does agentic AI spend actually go?
Agents changed the shape of AI cost. Gartner estimates an agentic query consumes 5 to 30 times the tokens of a chatbot query . An academic study of agent economics coauthored by Erik Brynjolfsson and Alex Pentland measured agentic coding tasks at roughly 1,000x the tokens of a comparable chat interaction, found that identical tasks vary up to 30x in token consumption from run to run, and found that accuracy often peaks at intermediate spend, meaning the extra tokens frequently buy nothing.
The structural reason is that agents pay for their context repeatedly. Every step in a loop re-sends the accumulated conversation, so a 20-step task pays for its early context roughly 20 times. The same study found input tokens, not output, are the primary cost driver.
Tool catalogs make that multiplication worse, because every connected tool definition rides along as input tokens on every single call:
- The Model Context Protocol community has measured tool schema overhead at roughly 1,000 tokens per tool .
- GitHub’s official MCP server exposes more than 90 tools, and independent benchmarks put its definitions at 17,000 to 55,000 tokens per request before the agent has done anything.
- Anthropic’s own engineering team showed a workflow dropping from 150,000 tokens to 2,000 , a 98.7% reduction, once tool definitions and intermediate results stopped flowing through the model unnecessarily.
An agent that needs six tools and loads 142 is the canonical form of AI waste: invisible at subsidized prices, expensive at metered ones, and fixable with infrastructure rather than discipline.
This waste is starting to register in industry data. Flexera’s 2026 State of the Cloud report found self-estimated cloud waste rose to 29%, the first increase in five years, and attributes the reversal to the influx of AI workloads. That figure measures cloud spend broadly, not AI spend specifically, because no one yet has an authoritative number for AI waste alone. That absence is itself the finding: most organizations cannot measure what their agents waste, and Gartner predicts more than 40% of agentic AI projects will be canceled by the end of 2027 , with escalating costs a leading cause.
What does FinOps for AI look like?
The discipline that tamed cloud spend is already turning to AI. The FinOps Foundation’s State of FinOps 2026 found that 98% of FinOps teams now manage AI spend, up from 63% in 2025 and 31% in 2024, and that AI cost management is the field’s top priority and largest skills gap. Practitioners report being asked to fund new AI investment out of optimization savings.
Cloud FinOps had a decade to build its tooling. AI FinOps inherits the questions without the instruments, because the answers live in a layer most organizations haven’t built. Three controls do most of the work:
- Attribution. Per-team and per-agent accounting of every model call and tool call. A provider invoice shows one number for the whole API key. It cannot say which team’s agent, which workflow, or which retry loop spent the money.
- Routing. Anthropic’s published prices span $1 per million input tokens for Haiku 4.5 to $10 for Fable 5 , a 10x range for tasks that often don’t need the top of it. RouteLLM demonstrated that routing easy queries to cheaper models preserves about 95% of frontier quality at 75% to 85% lower cost . The savings are real, but only for organizations that can route centrally instead of letting every team hardcode the flagship.
- Context efficiency. Tool filtering and dynamic toolsets, so an agent loads the six tools its task needs instead of all 142 it might theoretically use, plus caps on runaway loops and caching where context repeats.
None of these are policies a finance team can enforce from a spreadsheet. Attribution requires seeing every call with an identity attached. Routing requires sitting between agents and model providers. Tool filtering requires sitting between agents and their tools. All three describe the same piece of infrastructure: a control point on the path every agent shares.
The AI Control Plane is the FinOps layer for AI
An AI Control Plane is the governing layer between every agent in an organization and every model and system it can reach. The same single path that makes AI governable is what makes it meterable, and the diagram below shows the same agent workload billed both ways.

Speakeasy is building this layer as a product. The AI Control Plane puts an LLM gateway and an MCP gateway on that shared path, so the cost controls arrive as a byproduct of routing rather than a reporting project:
- Every model call and tool call carries the identity of the agent and team behind it, so spend attribution is a query, not an investigation.
- Policy at the gateway decides which model serves which workload, so routing economics apply across the organization instead of one well-run team.
- Tool filtering and dynamic toolsets trim the catalog each agent carries, attacking the input-token bloat that dominates agentic cost.
- The same audit log that satisfies governance requirements doubles as the meter: who spent what, on which model, doing which task.
The wrong response to rising AI prices is to ration AI, because the competitive cost of using less is higher than the invoice. The right response is to run it deliberately, with the waste visible and the spend attributed, the way every other serious line item in the company is run. The organizations that built control planes for security and governance reasons are discovering they already own the FinOps layer for AI. The ones that built nothing are about to meet their agents’ true cost of ownership, one metered token at a time.
Further reading
- The AI Control Plane: the reference architecture for governing AI across an organization, function by function.
- AI gateway vs MCP gateway vs AI Control Plane: which layer does what, and why the names matter.
- 2026 is the year of enterprise AI governance: why boards, CISOs, and platform teams are treating AI governance as infrastructure.
- How Uber built the enterprise AI security playbook: the gateway and identity stack one enterprise built before scaling AI deployment.
Per-token prices for a fixed capability level keep falling, but three forces outweigh the deflation: workloads migrate to frontier models that now launch at higher prices than their predecessors, agentic workloads consume 5 to 30 times the tokens of chatbot queries, and vendors have replaced flat-rate subscriptions with usage-based pricing that passes variable costs through to the customer.
FinOps for AI applies cloud cost-management discipline to AI spend: attributing every model and tool call to a team and workload, routing each task to the cheapest model that can handle it, and eliminating structural waste like oversized tool catalogs and runaway agent loops. The FinOps Foundation found 98% of FinOps teams now manage AI spend, up from 31% in 2024.
Gartner estimates an agentic query consumes 5 to 30 times the tokens of a chatbot query, and academic measurements put agentic coding tasks at roughly 1,000 times the tokens of a comparable chat interaction. The main driver is input tokens: agents re-send their accumulated context and full tool catalogs on every step of a loop.
An AI Control Plane sits on the path between every agent and every model or tool it calls. From that position it attributes spend to the team and agent responsible, routes each task to the most economical capable model, and filters tool catalogs so agents only load the definitions they need. The audit log it keeps for governance doubles as the cost meter.