What is MCP Sampling?
MCP Sampling lets servers ask for LLM completions through the client. This means your server can send a request to the LLM and receive a completion to continue solving a task.
MCP Sampling is unique in that it flips the typical flow: Rather than clients always initiating requests, servers can ask for AI help when needed. This enables an MCP server to:
- Use AI to make smart decisions based on available information.
- Create structured data outputs in specific formats.
- Complete multi-step workflows that require thinking.
- Analyze and respond to external data.
Sampling also allows for a human to be involved in the process: The user can review and approve both the request before sending it to the LLM and the completion before it’s returned to the server.
How sampling works
Here’s how MCP Sampling works:
- Your server sends a
sampling/createMessage
to the client while fulfilling a task. - Human checkpoint one: The client shows the user the exact prompt and context that will be sent to the LLM and the user may edit, approve, or reject this request.
- If the user approves the request, the client requests a completion from the LLM.
- Human checkpoint two: The client shows the user the full LLM response, which the user may edit, approve, or reject.
- If the user approves the response, the client sends the approved response to the MCP server.
This human-in-the-loop design ensures users maintain control over what the LLM sees and generates and also allows the server to use AI to enhance the task it’s trying to complete.
The beauty of this flow is that the human user maintains oversight. They can approve or reject the request before it’s returned to the server.
Request format
Sampling requests use a standardized message format:
{messages: [{role: "user" | "assistant",content: {type: "text" | "image",text?: string,data?: string, // base64 encodedmimeType?: string}}],modelPreferences?: {hints?: [{name?: string // Suggested model name/family}],costPriority?: number, // 0-1, importance of minimizing costspeedPriority?: number, // 0-1, importance of low latencyintelligencePriority?: number // 0-1, importance of capabilities},systemPrompt?: string,includeContext?: "none" | "thisServer" | "allServers",temperature?: number,maxTokens: number,stopSequences?: string[],metadata?: Record<string, unknown>}
Here’s what each field does:
messages
: Conversation messages withrole
andcontent
(which can be text or an image)modelPreferences
: Hints and priorities for model selection, for example, cost, speed, or intelligencesystemPrompt
: An optional directive for model behaviorincludeContext
: Additional MCP context to include- Sampling parameters: Controls like temperature and token limits
Response format
The client returns a completion result from the LLM:
{model: string, // Name of the model usedstopReason?: "endTurn" | "stopSequence" | "maxTokens" | string,role: "user" | "assistant",content: {type: "text" | "image",text?: string,data?: string,mimeType?: string}}
The response includes:
model
: The LLM that was used (for example,"claude-3.7-sonnet"
or"gpt-4o-mini"
)stopReason
: Why the generation stoppedrole
: Typically"assistant"
for model outputcontent
: The actual response content (which can be text or an image)
Example: Code review with MCP
Here’s an example of using sampling with an MCP-enabled dev tool integrated with an IDE. We want the language model to help us analyze and improve a code snippet. The server sends a sampling request containing the code and a prompt asking for refactoring suggestions.
How it works:
- Server request: The MCP server sends a sampling request with a prompt asking for a code review.
- LLM sampling: The client forwards the prompt to the LLM, which returns detailed suggestions as completions.
- Returned completion usage: The returned completion can be used by the MCP server to apply approved suggestions automatically.
Here’s the sampling request from the server:
{"method": "sampling/createMessage","params": {"messages": [{"role": "user","content": {"type": "text","text": "codeToReview" // The code to review}}],"systemPrompt": "You are a senior code reviewer with expertise in JavaScript. Provide detailed suggestions for refactoring the code.","includeContext": "thisServer","maxTokens": 150,"temperature": 0.4,"stopSequences": ["\n"]}}
// Simple function with a common pattern to improvefunction greet(name) {if (name) {return "Hello, " + name + "!";} else {return "Hello, guest!";}}
Here’s the returned completion:
{"model": "claude-3.7-sonnet","stopReason": "endTurn","role": "assistant","content": {"type": "text","text": "suggestedCode" // The LLM's suggested code}}
// Improved version with template literals and default parameterfunction greet(name = 'guest') {return `Hello, ${name}!`;}
What else can you do with sampling?
Sampling enables some powerful agentic patterns:
Decision making: Sampling can generate structured outputs for conditional logic in your applications. An agent could analyze available data to recommend the next steps in a complex workflow or evaluate user input to determine which process to trigger.
Multi-step tasks: With sampling, agents can chain multiple calls for complex workflows. This involves breaking down a complex problem into sequential steps and solving each, using the results of one step as input for the next while keeping the user in the loop.