Speakeasy Logo
Skip to Content

Gram — The MCP Cloud

Your fast path to production MCP. build, deploy and scale your MCP servers with ease using Gram's cloud platform.

Start building MCP

The OpenAI ecosystem: A developer’s guide to building agents with OpenAI

OpenAI ships something new almost every month. Models get renamed, preview features appear, and entire product lines arrive with little warning. If you’re building agents, you need a clear picture of which pieces expect code, which are visual builders, and how they all fit together.

This guide walks developers through OpenAI’s ecosystem with a focus on building agents. We compare code-first APIs with no-code builders, explain what each product does, and where relevant, demonstrate how to integrate them with external tools using MCP.

Code or no-code

OpenAI offers two main paths for building agents:

  • The code-first route gives developers full control over prompts, state, and infrastructure.
  • The no-code route provides visual builders and managed hosting that business users can operate.

You’ll likely start with one and migrate to the other, or run both in parallel. Here’s how they compare:

Code-first APIsNo-code builders
Control levelTotal control over prompts, state, infrastructureConfiguration only, OpenAI hosts
Who maintains itDevelopersProduct managers, support staff, operations, also developers
Tool integrationCall any external service via codeUse built-in connectors or workspace-approved integrations
When to useCustom apps, compliance-driven workloads, advanced copilotsFast pilots, embedded assistants, business-run automations
Key productsResponses API, Realtime API, Agents SDKAgentKit, ChatGPT GPT Builder, Workflows, Operator

The rest of this guide digs into each path and explains how the products within them work.

Building agents with code

This is the best place to start if you want full control over the prompts, latency, and infrastructure of your agent. The following APIs all sit behind the same pay-as-you-go account.

Responses API: The foundation

The Responses API replaces the old Chat Completions endpoint and the legacy Assistants API. It accepts multimodal input, streams outputs, handles tool calling and JSON mode, and supports the newer reasoning models (o1 and o3). Compared to the legacy Assistants API, the Responses API is stateless by default, so you decide where conversation history lives. Similarly, it improves on the old Chat Completions API by adding first-class JSON Schema validation and tool definitions.

Use the Responses API when you need a single request-response cycle: send a prompt, and get a completion. The API supports function calling, so your agent can invoke tools mid-generation. You define tools using the JSON Schema, the model decides when to call them, and you execute the tool logic in your code before sending the result back.

The following basic example calls the Responses API directly:

import OpenAI from "openai"; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); async function main() { const response = await openai.responses.create({ model: "gpt-5", input: "What's the weather in San Francisco?", tools: [ { name: "get_weather", type: "function", function: { description: "Get current weather for a location", parameters: { type: "object", properties: { location: { type: "string" }, }, required: ["location"], }, }, }, ], }); console.log(response.output_text); } main();

Realtime API: Voice and streaming

The Realtime API handles WebRTC or WebSocket sessions with sub-second latency. You stream audio, tool results, or screen events, and the model responds in the same channel. Compared to plain Responses API calls, you trade stateless requests for a session object that handles turn-taking.

Use the Realtime API for voice conversations, live transcription, or any scenario that requires continuous bidirectional communication. The API manages the session state, handles interruptions (when a user speaks over the model), and supports tool calling throughout the conversation.

The diagram below shows the Realtime API architecture:

  • Your application establishes a WebSocket or WebRTC connection.
  • Audio streams in both directions, and the model can invoke tools mid-conversation.
  • When a tool call happens, the session pauses, your code executes the tool, and the result streams back into the conversation.
Realtime API architecture diagram showing WebSocket/WebRTC connection between application and model with bidirectional audio streaming and tool calling flow

This architecture enables natural voice interactions. The model can ask clarifying questions, call tools to fetch data, and respond with synthesized speech in real time without breaking the conversation flow.

Check out our MCP use case guide for Realtime agents to learn how to build a voice assistant with the Realtime API.

Agents SDK: Orchestration in code

The Agents SDK is for developers who want to define agent logic in code rather than configuration. It works with both the Responses API and the Realtime API, allowing you to programmatically define workflows in Python (with Node.js support coming). The SDK is open source, so you can version control your agent definitions and test them locally before deployment.

How it works: You define agents with instructions, tool catalogs, and models in code. The Agents SDK handles multi-step workflows, persistent state, and evaluation traces. Unlike the stateless Responses API, the SDK manages conversation history, tool outputs, and run metadata for you.

Here’s a basic example:

import asyncio from typing import Annotated from pydantic import BaseModel, Field from agents import Agent, Runner, function_tool class Weather(BaseModel): city: str = Field(description="The city name") temperature_range: str = Field(description="The temperature range in Celsius") conditions: str = Field(description="The weather conditions") @function_tool def get_weather(city: Annotated[str, "The city to get the weather for"]) -> Weather: """Get the current weather information for a specified city.""" print("[debug] get_weather called") return Weather(city=city, temperature_range="14-20C", conditions="Sunny with wind.") agent = Agent( name="Hello world", instructions="You are a helpful agent.", tools=[get_weather], ) async def main(): result = await Runner.run(agent, input="What's the weather in Tokyo?") print(result.final_output) # The weather in Tokyo is sunny. if __name__ == "__main__": asyncio.run(main())
  • Run steps and tool calls: The SDK exposes run steps, which show you exactly what the agent did during execution — which tools it called, which parameters it used, and what the tools returned. This is critical for debugging and evaluation. You can inspect failed runs to see which tool timed out or which LLM call hallucinated.

    Run steps also let you forward tool calls to MCP servers. When the agent decides to call a tool, you read the tool name and parameters, call your MCP server (via stdio or HTTP), and submit the tool output back. The agent resumes and continues the workflow.

  • Multi-model runs: The Agents SDK can switch models mid-run. For example, it may start with GPT-5-mini for simple questions, but if the agent determines it needs deeper reasoning, it escalates to GPT-5. This saves costs on routine queries while maintaining quality for complex cases.

  • Evaluation traces: The SDK logs every decision the agent makes — which tools it considered, which it rejected, what prompts it sent to the model, and what responses it received. These traces help you benchmark agent performance, spot regressions, and fine-tune instructions.

  • MCP integration: The Agents SDK uses the same tool schema as MCP, so you can define your tool catalog once and reference it from both your SDK agents and your MCP servers. When an agent needs to call a tool, you can delegate to an MCP server via stdio or HTTP, keeping your agent logic portable across runtimes.

Note: OpenAI also offers the Assistants API, a REST API for building assistants with threads and runs. The Assistants API is being deprecated in 2026 in favor of the Responses API, which will incorporate all assistant features. For new projects, use the Agents SDK for code-based workflows or the Responses API for stateless interactions.

Codex: AI for software development

Codex is OpenAI’s specialized agent for coding tasks. It runs the GPT-5-Codex family of models and integrates directly into your development workflow. Codex handles code generation, editing, review, and infrastructure automation. It’s included with the ChatGPT Plus, Team, and Enterprise subscriptions.

You can install the Codex CLI globally with the command:

npm i -g @openai/codex

When you run the Codex CLI in your terminal, it watches your repository, responds to natural language prompts, and executes file edits, Git operations, and shell commands. It maintains context across your codebase, so you can ask it to refactor a function and it will find all the call sites.

Codex CLI in action showing terminal interface with code analysis and file operations

The Codex CLI logs all operations and asks for confirmation before destructive actions (such as deleting files or force-pushing to Git). You can integrate it into CI/CD pipelines or use it for one-off migrations. You can also run it inside Docker containers for sandboxed code generation.

Codex maintains context across multiple exchanges: It reads files, proposes changes, and asks for approval before applying edits. The colored terminal output distinguishes between analysis (understanding your request), planning (what it intends to do), and actual file operations. Codex references specific files and line numbers when explaining changes, making it easy to spot when it misunderstands your intent.

Codex also includes a web UI and IDE extensions. The web dashboard mirrors the CLI but adds a visual diff viewer and approval workflow. The interface splits into two panels, with conversation on the left and side-by-side diffs on the right. You can approve all changes at once or review each file individually. Shell commands (such as tests, builds, and Git operations) require explicit approval before execution.

Codex web interface showing split panel with conversation on left and code diffs on right

This visual approval workflow works well for code reviews. A reviewer can ask Codex to implement feedback, inspect the proposed changes, and approve or request modifications without switching to a terminal.

The Codex Slack and GitHub integrations allow it to respond to coding questions in Slack channels, generate pull requests from feature requests, or review PRs when tagged. When hooked into PR workflows in GitHub, Codex comments on PRs with suggestions, runs static analysis, and autogenerates changelog entries based on commits.

Similarly, the MCP integration for Codex lets it use MCP internally to call external tools. When you register MCP servers in your Codex config (CLI or IDE), Codex can invoke them when it needs domain-specific functionality. For example, if you register an MCP server that queries docs , Codex can look up endpoint schemas before generating integration code.

Building agents without code

Reach for these tools when you want distribution, guardrails, or quick experiments. The following no-code products are ideal for non-technical users who want to launch agents without touching source code.

AgentKit: Visual agent building

AgentKit launched at OpenAI DevDay 2025 as a complete toolkit for building, deploying, and optimizing agents without writing orchestration code. It includes four main components.

Agent Builder

Agent Builder is a visual canvas for building multi-agent workflows using drag-and-drop nodes. Think of it as “Canva for agents” — you drag nodes onto a canvas and connect them with arrows to define logic.

Agent Builder includes the following node types:

  • Agent nodes are LLM processing steps that reason over inputs and generate outputs.
  • Logic nodes add conditionals (if/else), loops, and branching to a workflow.
  • Tool connectors add MCP servers, API calls, and database queries.
  • User approval nodes pause workflows for human review.
  • Guardrail nodes enforce safety constraints on outputs.
  • Data transformation nodes format, filter, or restructure data between workflow steps.

The Agent Builder canvas includes versioning, preview runs, and templates for common patterns (such as customer service, data enrichment, and document comparison). When you build a workflow, you test it live in a preview panel that shows which nodes execute and which branches the agent takes.

Agent Builder compiles workflows to Agents SDK runs, so you can copy your workflows as code that is fully compatible with the Agents SDK. You can also connect MCP servers as nodes to extend your workflows with your own custom logic and data.

AgentKit Agent Builder interface showing drag-and-drop canvas with connected workflow nodes

ChatKit

ChatKit is an embeddable chat interface that brings AI assistants into web and mobile apps. It provides a production-ready chat UI with built-in tool integration, conversation state, and streaming responses.

ChatKit diagram showing how the embeddable chat interface integrates with web and mobile applications

Embed ChatKit into your app using the JavaScript library (React, Vue, Angular, and vanilla JS are all supported):

import { ChatKit } from "@openai/chatkit-react"; function App() { return ( <ChatKit appId="your-app-id" userId={currentUser.id} theme="light" position="bottom-right" /> ); }

ChatKit handles user identity automatically. Just pass a userId, and the assistant remembers context across sessions. The widget includes typing indicators, file uploads (PDFs, images), voice input, and conversation history. Users can scroll through past conversations or start new threads.

ChatKit Studio

ChatKit Studio is a visual interface for building and configuring ChatKit instances without code. It lets you configure models, instructions, data sources, tools, and UI themes, with a live preview panel for testing the assistant before deployment.

ChatKit Studio interface showing configuration options and live preview panel

We recommend using an OpenAI-hosted backend for ChatKit, where OpenAI handles hosting and scaling, but a self-hosted option is also available if you want to run ChatKit on your own infrastructure using the ChatKit Python SDK.

ChatGPT MCP connectors

ChatGPT supports MCP connectors across the Plus, Pro, Team, Enterprise, and Edu tiers. Register your MCP servers through the Connectors interface, and ChatGPT can call them during conversations. Unlike earlier limitations, MCP connectors now support both read and write operations, so ChatGPT can update tickets in your issue tracker, trigger workflows in your automation system, write to databases, and call internal APIs.

MCP servers connect to ChatGPT via remote endpoints using Server-Sent Events (SSE) or streaming HTTP protocols. OAuth handles authentication when needed. ChatGPT displays confirmation modals before executing write or modify actions, and workspace admins control which users can register custom connectors.

ChatGPT MCP connectors interface showing registered servers and connector management

Atlas Browser

OpenAI launched ChatGPT Atlas in October 2025, a Chromium-based browser with ChatGPT built into the browsing experience.

Agent mode (available for Plus, Pro, and Business users) lets ChatGPT work autonomously within the browser, researching topics, automating tasks, planning events, or booking appointments while you browse.

Atlas also has first-class support for ChatGPT connectors as well as for Apps SDK-based integrations.

ChatGPT Atlas browser interface showing integrated AI assistance while browsing

Widget Builder and Apps SDK: From limited tools to full dev support

ChatGPT’s development story started with /search and /fetch as the only tools available for deep research. Today, these limited tools have evolved into full developer support through the Apps SDK and Widget Builder.

Apps SDK

The Apps SDK is the foundation for building ChatGPT integrations. It bridges your backend services and ChatGPT, letting you define tools, handle authentication, and manage state across conversations. The SDK handles the protocol details so you focus on exposing your API’s capabilities.

At its core, the Apps SDK is an MCP server that runs in your infrastructure. You define tools using the standard MCP schema (with tool names, descriptions, input schemas, and handler functions). ChatGPT discovers these tools and calls them when users ask for something your app can do.

The SDK adds features beyond basic MCP, including OAuth flows for user authentication, session persistence so your tools remember context across messages, and rate limiting per user or per workspace. It handles error recovery automatically. If your tool times out or returns an error, the SDK retries or prompts the user for clarification.

Here’s how you define a tool with the Apps SDK:

async function loadKanbanBoard() { const tasks = [ { id: "task-1", title: "Design empty states", assignee: "Ada", status: "todo" }, { id: "task-2", title: "Wireframe admin panel", assignee: "Grace", status: "in-progress" }, { id: "task-3", title: "QA onboarding flow", assignee: "Lin", status: "done" } ]; return { columns: [ { id: "todo", title: "To do", tasks: tasks.filter((task) => task.status === "todo") }, { id: "in-progress", title: "In progress", tasks: tasks.filter((task) => task.status === "in-progress") }, { id: "done", title: "Done", tasks: tasks.filter((task) => task.status === "done") } ], tasksById: Object.fromEntries(tasks.map((task) => [task.id, task])), lastSyncedAt: new Date().toISOString() }; } server.registerTool( "kanban-board", { title: "Show Kanban Board", _meta: { "openai/outputTemplate": "ui://widget/kanban-board.html", "openai/toolInvocation/invoking": "Displaying the board", "openai/toolInvocation/invoked": "Displayed the board" }, inputSchema: { tasks: z.string() } }, async () => { const board = await loadKanbanBoard(); return { structuredContent: { columns: board.columns.map((column) => ({ id: column.id, title: column.title, tasks: column.tasks.slice(0, 5) // keep payload concise for the model })) }, content: [{ type: "text", text: "Here's your latest board. Drag cards in the component to update status." }], _meta: { tasksById: board.tasksById, // full task map for the component only lastSyncedAt: board.lastSyncedAt } }; } );

The context object gives you access to the authenticated user, their workspace, and any state you’ve stored in previous tool calls. This lets you build stateful agents that remember preferences, cache API responses, or track multi-step workflows.

The Apps SDK supports multiple authentication patterns.

  • For workspace integrations: Use OAuth to let users connect their accounts (for example, Slack, Google Calendar, or Salesforce). The SDK stores the OAuth tokens securely and refreshes them automatically. Your tool handlers receive the user’s token via the context, so you make API calls on their behalf without exposing credentials to ChatGPT.
  • For internal tools: Skip OAuth and rely on ChatGPT’s workspace auth. If an admin registers your app in a Team or Enterprise workspace, the SDK trusts that all users in that workspace are authorized.

Widget Builder

Built on top of the Apps SDK, Widget Builder lets you return rich UI components from your MCP tools, not just plain text. When ChatGPT calls your tool, it responds with interactive widgets, such as data tables, forms, charts, or custom visualizations. The user sees these rendered directly in the ChatGPT interface.

Widget Builder interface showing rich UI component creation for MCP tools

Widgets bind to data using placeholders like {{task.title}} that fill in when your tool returns results. You can rapidly develop widgets using the low-code Widget Builder in ChatKit Studio.

With Widgets, your MCP server does more than answer questions — it becomes interactive. A task management tool returns a widget with checkboxes to complete tasks. A CRM tool returns a customer card with click-to-call buttons. A metrics tool returns live charts that update when the user asks follow-up questions.

Your MCP server defines tools with a _meta field that points to a widget URI. When ChatGPT calls the tool, your server returns data and the widget template. ChatGPT renders the widget using your template and the returned data.

server.registerResource( "html", "ui://widget/widget.html", {}, async (req) => ({ contents: [ { uri: "ui://widget/widget.html", mimeType: "text/html", text: ` <div id="kitchen-sink-root"></div> <link rel="stylesheet" href="https://persistent.oaistatic.com/ecosystem-built-assets/kitchen-sink-2d2b.css"> <script type="module" src="https://persistent.oaistatic.com/ecosystem-built-assets/kitchen-sink-2d2b.js"></script> `.trim(), _meta: { "openai/widgetCSP": { connect_domains: [], resource_domains: ["https://persistent.oaistatic.com"], } }, }, ], }) );

Workflows and Operator

Workflows let you pin deterministic steps around model calls. You define stages that can branch, await human review, or trigger tools. Operator is OpenAI’s research preview agent that controls a browser and handles long-running tasks.

Both use the same tool definition format as the Responses API. Register your MCP tools once and make them available across these runtimes. Workflows is currently invite-only for enterprise customers; Operator is also on a waitlist and expects clear guardrails in your application.

The supporting cast: Models, creative tools, and infrastructure

These services share billing with the APIs but introduce their own review processes and constraints. We’ll cover them briefly so you know where they fit.

Models: Naming and routing

OpenAI’s model names shift frequently. As of early 2025, the main families are:

  • GPT-5: The flagship model for complex reasoning and multi-step tasks
  • GPT-5-mini: A lightweight variant optimized for speed and cost that is good for high-volume tasks
  • GPT-5-nano: The smallest, cheapest model in the family, which is best for fast responses for simple queries
  • GPT-5-Codex (high), GPT-5-Codex (medium), GPT-5-Codex (low): Specialized models for code generation and software development tasks, where high offers the most accuracy and low prioritizes speed

All these models are available through the Responses API and ChatGPT (depending on your tier). The Codex variants power the Codex developer tools. Pricing varies by model, so we recommend routing expensive models through a dedicated MCP tool to audit usage.

Creative endpoints: Images, audio, and video

DALL-E 3 image generation: DALL-E 3 handles prompt-to-image generation, inpainting, and image variation. The responses return URLs or Base64 blobs. When using MCP, you wrap the call in a tool and return a resource with the download link so the client can show a thumbnail or fetch the file later.

Voice Engine and the Audio API endpoint: With these tools, OpenAI covers both text-to-speech and speech-to-text. When these capabilities are paired with the Realtime API, latency is low enough for real-time agents. A common pattern is an MCP tool that takes text, calls the audio API, uploads the result to object storage, and returns a resource URI for the client to stream. This is the same pattern OpenAI uses in ChatGPT’s mobile apps when you ask about a photo and expect a spoken answer.

Sora video: OpenAI’s text-to-video model, Sora, generates short, high-fidelity clips. Access is currently limited to creative partners and enterprise pilots. Rendering jobs take minutes, so treat them as asynchronous tasks: expose an MCP tool that submits the prompt, return a completion handle, and stream progress updates as the job advances.

Infrastructure and retrieval

Embeddings and vector stores: Embeddings endpoints (text-embedding-3-large, text-embedding-3-small, and domain variants) power retrieval pipelines. Vector stores add metadata filtering and chunk management. These services are regional, so double-check availability if your MCP servers run in another region. Treat long-running jobs as MCP completables: queue the work, return a completion handle, and let the client poll or subscribe.

Fine-tuning: The Fine-tuning endpoint adapts a base model to your tone or domain, but you still call the resulting model through the Responses API. It lets you designate jobs and use fine-tuned model IDs in subsequent calls.

Batch jobs: Batch processing lets you send tens of thousands of prompts at a lower cost. It pairs nicely with MCP. Expose a submit_batch tool and emit status updates as resources.

Files API: The Files endpoint stores training data or retrieval corpora and is the only sanctioned way to share larger documents with OpenAI services.

Connecting it all: MCP integration patterns

MCP gives you one adapter surface as OpenAI’s products evolve. Here are the main patterns for integrating your MCP servers with OpenAI’s ecosystem.

Direct integration: MCP server wraps OpenAI API

This is the pattern we showed earlier with the Responses API. Your MCP server keeps your OpenAI API key server-side and exposes a tool that calls the API. The client (Claude, Cursor, or another MCP host) calls your tool, and you forward the request to OpenAI.

This keeps secrets off the client and lets you audit all OpenAI usage in one place. Add rate limiting, cost tracking, or custom logging before forwarding calls.

Reusable catalogs: Call from multiple runtimes

The biggest MCP benefit is consistency. Once you model an internal API as an MCP server, you call it from Claude, ChatGPT, Cursor, and any other MCP-compatible runtime.

Define your tool schema once, register it in ChatGPT Team workspaces and Claude projects, and both runtimes use it. When OpenAI adds a new agent product (like Workflows or Operator), you don’t rebuild integrations — you just point it at your existing MCP catalog.

Web search: Built-in tools and Deep Research

Responses API web search: The Responses API includes a web_search tool that you enable per account. Pass the tool in your request, and the model decides when to use it. When triggered, the API returns responses with URL citations and source annotations. The limitation is that you don’t control the search provider, citation format, or result ranking.

OpenAI built-in search and fetch tools interface showing web search capabilities

Deep Research: ChatGPT’s Deep Research feature (launched February 2025) is an autonomous agent that conducts multi-step web research. Give it a prompt, and it searches, analyzes, and synthesizes hundreds of sources to produce a comprehensive report. Deep Research takes 5-30 minutes and uses a version of the o4 model optimized for web browsing. It’s available in the ChatGPT Plus, Team, Enterprise, and Pro tiers, with API access launched in June 2025.

Deep research with MCP integration showing comprehensive research workflow

Custom retrieval with MCP: Instead of conducting generic web searches, Deep Research supports MCP tool queries that can fetch internal documentation, compliance databases, or proprietary data sources. This returns structured results (document titles, relevance scores, snippets, and metadata) that the model can reason over. Custom retrieval gives you control and auditability — you can log every query and view citations you can verify.

Plans, pricing, and requirements

OpenAI’s offerings cluster into several pricing tiers. Here’s a snapshot as of early 2025 to help you navigate what you need.

API pay-as-you-go

Pay-as-you-go is required for Responses API, Realtime API, Assistants API, embeddings, vector stores, fine-tuning, batch jobs, and creative endpoints. You pay per token or per request, and the price varies by model.

Most regions require a verified billing method and a $5 prepayment to activate the account. Reasoning models (o1 and o3) cost significantly more per token, so we recommend routing them through a dedicated MCP tool to audit usage.

Add-on tools

Adding web search capability enables web_search and url_get tools in the Responses API. Web search is still rolling out; Plus, Team, and Enterprise customers can usually flip it on through the dashboard, while some organizations need to email support to request access.

Pricing is per search query (roughly $10 per 1,000 searches). To avoid this cost, build your own search MCP tool instead.

Search Tools pricing table showing cost breakdown for different usage tiers

Next steps for builders

If you’re a developer, start by building MCP servers that expose your own tools and data sources. Create servers that query your databases, call your internal APIs, access your file systems, or integrate with services you use (like GitHub, Stripe, and CRMs). Define each tool using the JSON Schema, handle authentication, and test locally before deploying. This gives you a reusable catalog of tools that works across Claude, Cursor, ChatGPT, and any other MCP-compatible runtime.

If you’re building a platform, decide whether to centralize tools using the Agents SDK or expose them directly through ChatGPT workspaces with the Apps SDK. A common approach is to use both: maintain one MCP catalog that powers both SDK-driven agents and workspace GPTs.

For ChatGPT Team or Enterprise plans, register your MCP servers in the workspace so every GPT model inherits them automatically. This means you approve tools once, and they become available everywhere, including in ChatKit embeds and the desktop apps.

Keep an eye on OpenAI’s release notes. This ecosystem shifts fast, but MCP gives you one adapter surface as the products evolve. The more you express your systems as MCP tools, the easier it is to evaluate whatever OpenAI launches next without rebuilding integrations.

Last updated on