Speakeasy Logo
Skip to Content

Product Updates

MCP

How we reduced MCP token usage 100x with dynamic toolsets

Chase Crumbaugh

Chase Crumbaugh

November 13, 2025 - 7 min read

Product Updates

We shipped a major improvement to Gram that reduces token usage for MCP servers by 100x (or more) while maintaining consistent performance as toolset size grows. With dynamic toolsets, you can now build MCP servers with hundreds of tools without overwhelming the LLM’s context window.

Our benchmarks show that traditional MCP servers with 400 tools consume over 400,000 tokens before the LLM processes a single query. This is completely intractable (consider that Claude Code’s maximum context window is 200,000 tokens). With dynamic toolsets, that same server uses just a few thousand tokens initially, with tools discovered only as needed. More importantly, this efficiency remains constant even as toolsets scale from 40 to 400+ tools.

The problem with static toolsets

When you connect an MCP server to an AI agent like Claude, every tool’s schema is loaded into the context window immediately. For small servers with 10-20 tools, this works fine. But as your tool count grows, token usage explodes.

Consider a general purpose MCP server for a large enterprise product with hundreds of tools. With a static approach, you’re looking at 405,000 tokens consumed before any work begins. Since Claude’s context window is 200,000 tokens, this server simply won’t work. Even if it did fit, you’d be wasting most of your context on tools the LLM will never use for a given task.

Dynamic toolsets: Two approaches

We’ve implemented two experimental approaches to dynamic tool discovery, both of which dramatically reduce token usage while maintaining full functionality.

Progressive search uses a hierarchical discovery approach. Instead of exposing all 400 tools at once, we compress them into three meta-tools that the LLM can use to progressively discover what it needs:

list_tools allows the LLM to discover available tools using prefix-based lookup. For example, list_tools(/hubspot/deals/*) returns only tools related to HubSpot deals. The tool description includes the structure of available sources and tags, creating a hierarchy that guides discovery.

describe_tools provides detailed information about specific tools, including input schemas. We keep this separate from list_tools because schemas represent a significant portion of token usage. This separation optimizes context management at the cost of requiring an additional tool call.

execute_tool runs the discovered and described tools as needed.

With progressive search, our 400-tool server uses just 3,000 tokens initially and 3,000 additional tokens to complete a simple query like “List 3 HubSpot deals.” That’s a total of 6,000 tokens compared to 405,000 with the static approach.

Semantic search provides an embeddings-based approach to tool discovery. We create embeddings for all tools in advance, then search over them to find relevant tools for a given task.

find_tools executes semantic search over embeddings created from all tools in the toolset. The LLM describes what it wants to accomplish in natural language, and the search returns relevant tools. This is generally faster than progressive search, especially for large toolsets, but may have less complete coverage since the LLM has no insight into what tools are available broadly.

execute_tool runs the tools found through semantic search.

With semantic search, our 400-tool server uses just 2,000 tokens initially and 3,000 additional tokens to complete the same query. That’s a total of 5,000 tokens, making it slightly more efficient than progressive search for this use case.

Performance comparison

We conducted preliminary performance testing across toolsets of varying sizes (40, 100, 200, and 400 tools) with both simple and complex tasks. The results demonstrate that dynamic toolsets not only reduce initial token usage but also maintain consistent performance as toolset size grows.

Token usage and costs

We used Claude Code (Sonnet 4) to test the performance of dynamic toolsets.

Strategy / Toolset SizeInitial TokensSimple Task TokensSimple Task CostComplex Task TokensComplex Task Cost
40 tools
Static43,3001,000$0.0451,300$0.146
Progressive1,6001,800$0.0722,800$0.051
Semantic1,3002,700$0.05022,600$0.065
100 tools
Static128,9001,300$0.3731,300$0.155
Progressive2,4002,700$0.0766,600$0.096
Semantic1,3004,300$0.05312,200$0.134
200 tools
Static261,700
Progressive2,5002,900$0.0775,300$0.098
Semantic1,3004,000$0.07126,300$0.126
400 tools
Static405,100
Progressive2,5002,700$0.0785,700$0.099
Semantic1,3003,400$0.0698,300$0.160

Note: Static toolsets with 200 and 400 tools exceeded Claude Code’s 200k context window limit and could not complete tasks.

Key insights

The data reveals several critical advantages of dynamic toolsets:

Consistent scaling: Dynamic toolsets maintain relatively constant token usage and costs even as toolset size quadruples from 100 to 400 tools. Progressive search uses ~2,500 initial tokens regardless of toolset size, while semantic search remains at just 1,300 tokens.

Cost escalation with static toolsets: For smaller toolsets where static approaches work, costs escalate dramatically with multiple tool calls. The base cost is similar, but static toolsets send the entire tool context with every completion request.

Task complexity handling: Both dynamic approaches handle complex multi-step tasks effectively, with progressive search showing particularly stable token usage patterns across different task complexities.

Scaling behavior

One of the most significant findings is how differently static and dynamic toolsets scale:

Static toolsets: Initial token usage grows linearly with toolset size (405k tokens for 400 tools vs 43k for 40 tools). Beyond 200 tools, they exceed context window limits entirely.

Dynamic toolsets: Initial token usage remains essentially flat. Progressive search uses 1,600-2,500 tokens regardless of whether the toolset has 40 or 400 tools. Semantic search is even more consistent at just 1,300 tokens across all sizes.

This scaling difference means that dynamic toolsets become exponentially more efficient as APIs grow larger, making them essential for comprehensive API coverage.

Key benefits

Dynamic toolsets unlock several important capabilities:

Support for very large APIs: You can now build MCP servers for APIs with hundreds or thousands of operations without hitting context limits. This makes it practical to expose comprehensive API functionality to AI agents.

Efficient context usage: Only the tools actually needed for a task consume tokens. If an agent needs to work with HubSpot deals, it doesn’t need to load schemas for Dub links or any other unrelated tools.

Faster response times: Less context to process means faster initial responses. Semantic search is particularly fast since it requires fewer tool calls than progressive search.

Predictable costs: While static toolsets show dramatic cost increases with multiple operations (up to 8x higher for complex tasks), dynamic toolsets maintain consistent pricing regardless of toolset size.

Better scaling: As your API grows, dynamic toolsets scale naturally. Adding 100 new endpoints doesn’t increase initial token usage at all.

Technical implementation

Both approaches work by exposing meta-tools that handle discovery rather than exposing individual API operations directly. The key insight is that tool schemas represent the bulk of token usage, so we defer loading them until the LLM explicitly requests them.

Progressive search provides complete visibility into available tools through hierarchical navigation. The LLM can explore the tool structure systematically, making it ideal when you need comprehensive coverage or when tool names follow clear patterns.

Semantic search trades complete visibility for speed and natural language discovery. It works best when tool descriptions are high-quality and when the LLM’s intent can be captured in a natural language query.

Getting started

To enable dynamic toolsets, head to the MCP tab in your Gram dashboard and switch your toolset to either progressive search or semantic search mode.

Note that this setting only applies to MCP and won’t affect how your toolset is used in the playground, where static tool exposure remains useful for testing and development.

Important caveats

This performance data represents preliminary results from a handful of test runs with unoptimized dynamic implementations. While the trends are clear and consistent, more extensive testing is needed to establish median performance characteristics across different scenarios.

The relatively simple tasks used in this benchmarking achieved very high success rates (nearly 100%), but tool call accuracy for more complex workflows requires further validation. Static toolsets with 200 and 400 tools could not be tested due to Claude Code’s 200k context window limit.

What’s next

Dynamic toolsets are currently experimental as we continue to validate tool selection accuracy across different use cases. We’re particularly interested in understanding how they perform with:

  • Very large toolsets (1,000+ tools)
  • Complex multi-step workflows
  • Domain-specific APIs with specialized terminology
  • Long-running agent sessions

We’re also exploring hybrid approaches that combine the strengths of both progressive and semantic search, as well as intelligent caching strategies to further optimize token usage for repeated queries.

If you’re building MCP servers for large APIs, dynamic toolsets make it possible to expose comprehensive functionality without overwhelming the LLM’s context window. Try them out and let us know how they work for your use case.

Additional reading

Last updated on

Organize your
dev universe,

faster and easier.

Try Speakeasy Now