Skip to Content

AI & MCP

The AI subsidy is ending, and waste is about to become a line item

Cameron McClellan

Cameron McClellan

June 12, 2026 - 11 min read

The AI subsidy is ending, and waste is about to become a line item

Both frontier labs filed for IPOs this month, and their pricing pages had already announced what the prospectuses will confirm: the subsidized era of AI is over. Anthropic submitted a confidential draft S-1 on June 1 , OpenAI followed on June 8 , and companies preparing to face public markets price their products to make money, not to win adoption.

The repricing is already starting. OpenAI’s newest flagship costs exactly twice its predecessor per token . So does Anthropic’s . GitHub Copilot retired flat-rate premium requests  entirely this month, and the multiplier on Opus-class models  went from 7.5x to 27x between March and June. Enterprises that planned 2026 budgets on 2024 economics are discovering the difference one invoice at a time.

For three years, subsidized compute has meant that companies didn’t need to distibuish between AI spend that produced value and AI spend that produced nothing. An agent that burned 100,000 tokens re-reading the same context, or loaded 142 tool definitions to use six of them, cost roughly what a well-run one did. Metered honestly, those two agents have very different price tags, and the gap between them is about to show up as a line item. The companies that keep scaling AI through this shift will be the ones that can see that line item and act on it, which is an infrastructure problem before it is a budgeting one.


Why are AI prices rising?

The prices that are rising are the ones enterprises actually pay: frontier models, premium capacity, and the formerly flat-rate plans that wrapped them.

Frontier list prices doubled in a generation

For the first time since the GPT-4 era, successor models launched at higher prices than the models they replaced:

There are quieter increases too. Anthropic’s own pricing documentation notes that the tokenizer introduced with Opus 4.7 “may use up to 35% more tokens for the same fixed text” . The rate card stays flat while the cost per task rises.

Flat-rate AI plans are disappearing

Every major AI coding vendor abandoned all-you-can-eat pricing within roughly 12 months:

Each of these vendors sat between an enterprise and a frontier lab, absorbing variable token costs under a fixed subscription. That position became unsustainable for all of them in the same year, which is what the end of a subsidy looks like from downstream.

The labs can’t subsidize usage forever

The economics behind the repricing are stark even through the keyhole of public filings. Microsoft’s quarterly SEC filing implied that OpenAI lost on the order of $10 billion in the quarter ending September 2025 , based on Microsoft’s share of those losses. Anthropic, for its part, is reported to be approaching its first profitable quarter  heading into its listing, which is what investor scrutiny does to pricing discipline. As Delphi Labs’ Kevin Simback put it in AFP’s reporting on soaring enterprise AI bills , the industry is exiting its era of “subsidized intelligence.”


If tokens keep getting cheaper, why is the AI bill going up?

The paradox at the center of AI budgeting is that both of these are true at once: per-token prices have collapsed, and total bills have never been higher.

The deflation is real and well measured. Epoch AI found that the price to reach a fixed capability level falls between 9x and 900x per year depending on the task, with a median around 50x . Andreessen Horowitz measured a 1,000x decline in the cost of GPT-3-level output over three years . A token of 2024-quality intelligence is nearly free in 2026.

Bills rose anyway. Menlo Ventures’ survey of enterprise buyers  found model API spend hit $8.4 billion in the first half of 2025, more than doubling in six months. Gartner raised its 2026 worldwide AI spending forecast to $2.59 trillion , up 47% in a year. Gartner analyst Will Sommer resolved the paradox in one sentence: token costs are coming down, but the applications they unlock “are going to be more expensive, not less” .

Three things are rising fast enough to swamp the per-token deflation:

Falling token prices were never going to save a budget from those three forces. The only durable lever is consuming fewer wasted tokens, which requires knowing which tokens are wasted.


Where does agentic AI spend actually go?

Agents changed the shape of AI cost. Gartner estimates an agentic query consumes 5 to 30 times the tokens of a chatbot query . An academic study of agent economics coauthored by Erik Brynjolfsson and Alex Pentland  measured agentic coding tasks at roughly 1,000x the tokens of a comparable chat interaction, found that identical tasks vary up to 30x in token consumption from run to run, and found that accuracy often peaks at intermediate spend, meaning the extra tokens frequently buy nothing.

The structural reason is that agents pay for their context repeatedly. Every step in a loop re-sends the accumulated conversation, so a 20-step task pays for its early context roughly 20 times. The same study found input tokens, not output, are the primary cost driver.

Tool catalogs make that multiplication worse, because every connected tool definition rides along as input tokens on every single call:

An agent that needs six tools and loads 142 is the canonical form of AI waste: invisible at subsidized prices, expensive at metered ones, and fixable with infrastructure rather than discipline.

This waste is starting to register in industry data. Flexera’s 2026 State of the Cloud report  found self-estimated cloud waste rose to 29%, the first increase in five years, and attributes the reversal to the influx of AI workloads. That figure measures cloud spend broadly, not AI spend specifically, because no one yet has an authoritative number for AI waste alone. That absence is itself the finding: most organizations cannot measure what their agents waste, and Gartner predicts more than 40% of agentic AI projects will be canceled by the end of 2027 , with escalating costs a leading cause.


What does FinOps for AI look like?

The discipline that tamed cloud spend is already turning to AI. The FinOps Foundation’s State of FinOps 2026  found that 98% of FinOps teams now manage AI spend, up from 63% in 2025 and 31% in 2024, and that AI cost management is the field’s top priority and largest skills gap. Practitioners report being asked to fund new AI investment out of optimization savings.

Cloud FinOps had a decade to build its tooling. AI FinOps inherits the questions without the instruments, because the answers live in a layer most organizations haven’t built. Three controls do most of the work:

None of these are policies a finance team can enforce from a spreadsheet. Attribution requires seeing every call with an identity attached. Routing requires sitting between agents and model providers. Tool filtering requires sitting between agents and their tools. All three describe the same piece of infrastructure: a control point on the path every agent shares.


The AI Control Plane is the FinOps layer for AI

An AI Control Plane is the governing layer between every agent in an organization and every model and system it can reach. The same single path that makes AI governable is what makes it meterable, and the diagram below shows the same agent workload billed both ways.

Comparison of one agent workload ungoverned versus through the control plane. Tool definitions: 142 tools loaded, roughly 55,000 tokens before any work, becomes 6 filtered tools, just the definitions a task needs. Model selection: every task hardcoded to the frontier model at $10 per million tokens becomes routing by task across $1 to $10, the cheapest capable model. Context: full history re-sent on every step, where input tokens dominate the bill, becomes cached and trimmed context paid for once. Spend: one unattributable number on the invoice becomes spend attributed per team and agent, so waste shows up with a name on it.

Speakeasy is building this layer as a product. The AI Control Plane puts an LLM gateway and an MCP gateway on that shared path, so the cost controls arrive as a byproduct of routing rather than a reporting project:

  • Every model call and tool call carries the identity of the agent and team behind it, so spend attribution is a query, not an investigation.
  • Policy at the gateway decides which model serves which workload, so routing economics apply across the organization instead of one well-run team.
  • Tool filtering and dynamic toolsets trim the catalog each agent carries, attacking the input-token bloat that dominates agentic cost.
  • The same audit log that satisfies governance requirements doubles as the meter: who spent what, on which model, doing which task.

The wrong response to rising AI prices is to ration AI, because the competitive cost of using less is higher than the invoice. The right response is to run it deliberately, with the waste visible and the spend attributed, the way every other serious line item in the company is run. The organizations that built control planes for security and governance reasons are discovering they already own the FinOps layer for AI. The ones that built nothing are about to meet their agents’ true cost of ownership, one metered token at a time.


Further reading


Frequently asked questions

Per-token prices for a fixed capability level keep falling, but three forces outweigh the deflation: workloads migrate to frontier models that now launch at higher prices than their predecessors, agentic workloads consume 5 to 30 times the tokens of chatbot queries, and vendors have replaced flat-rate subscriptions with usage-based pricing that passes variable costs through to the customer.

FinOps for AI applies cloud cost-management discipline to AI spend: attributing every model and tool call to a team and workload, routing each task to the cheapest model that can handle it, and eliminating structural waste like oversized tool catalogs and runaway agent loops. The FinOps Foundation found 98% of FinOps teams now manage AI spend, up from 31% in 2024.

Gartner estimates an agentic query consumes 5 to 30 times the tokens of a chatbot query, and academic measurements put agentic coding tasks at roughly 1,000 times the tokens of a comparable chat interaction. The main driver is input tokens: agents re-send their accumulated context and full tool catalogs on every step of a loop.

An AI Control Plane sits on the path between every agent and every model or tool it calls. From that position it attributes spend to the team and agent responsible, routes each task to the most economical capable model, and filters tool catalogs so agents only load the definitions they need. The audit log it keeps for governance doubles as the cost meter.

Last updated on

AI everywhere.