Speakeasy Logo
Skip to Content

Engineering

Making your CLI agent-friendly

Speakeasy Team

Speakeasy Team

February 10, 2026 - 7 min read

Engineering

Two years ago, we wrote about why and how we built our CLI. Our design principles were clear: assume no prior context, make it conversational, keep it simple. We built interactive prompts using Huh , visual feedback with spinners and progress tables, and guided workflows that walked users through complex operations step by step.

We were proud of that work. The CLI felt polished, approachable, and distinctly Speakeasy.

Then AI agents started using it.

The features we’d carefully designed for humans became obstacles. Interactive prompts blocked execution. Spinners confused output parsing. Guided workflows that required user input left agents stuck, waiting for responses that would never come.

This post covers what we learned retrofitting our CLI for AI agents, and the broader question of where CLIs fit in the emerging ecosystem of agent tools.

The irony: Human-centric design broke agent workflows

Our original CLI principles optimized for human developers:

“Make it Conversational” led us to build interactive prompts that guide users through configuration. For agents, these prompts are blockers, they wait for input that agents can’t provide through a shell command.

Visual feedback like spinners and progress tables made the CLI feel responsive. For agents, this output is noise that consumes context window tokens without providing useful information.

Guided workflows walked users through multi-step processes. For agents, each step that requires interaction is a potential failure point.

The features we were most proud of became the biggest obstacles to agent adoption.

Before diving into our changes, it’s worth understanding the landscape. There are three main ways to give AI agents access to your tools:

MCP servers provide structured tool definitions with typed inputs and outputs, purpose-built for agent consumption. They’re ideal for APIs, stateless operations, and controlled environments. We built Gram  specifically for this use case. It’s the right approach when you’re designing for agents from the start.

Skills are markdown files that teach agents how to use existing capabilities (tools, functions, CLIs, etc). They’re lightweight, require no infrastructure, and work across agent platforms. Skills are ideal for CLI tools, complex workflows, and local development scenarios where agents have shell access.

CLIs are the original developer interface. They’re already installed on developer machines, battle-tested, and familiar. The challenge is that they weren’t designed for agent consumption, but agents are using them anyway, through shell access in tools like Claude Code, Cursor, and GitHub Copilot.

For Speakeasy, the answer wasn’t to replace our CLI with an MCP server. The CLI is deeply integrated into developer workflows: local development, CI pipelines, GitHub Actions. Instead, we needed to make the CLI work well when agents use it, while preserving the human experience we’d built.

Our approach combines two pieces:

  1. Skills that teach agents how to use our CLI effectively
  2. CLI improvements that remove friction for automated execution

Building skills for CLI guidance

We started with a Claude-specific plugin, but quickly realized this tied us to a single platform. When we discovered Vercel’s skills specification , we migrated to a tool-agnostic approach.

Skills are markdown files with frontmatter that defines trigger conditions. When an agent encounters a task that matches a skill’s description, it reads the skill content for guidance. The approach is lightweight: no servers, no APIs, just documentation that agents can consume.

npx skills add speakeasy-api/skills

This single command installs our skills for Claude Code, Cursor, Windsurf, GitHub Copilot, Gemini CLI, and 15+ other platforms that support the specification.

We also built the speakeasy agent subcommand, which bundles our documentation into the CLI itself. Think of it as our doc site wrapped in progressive disclosure CLI commands. Agents can traverse the documentation tree offline, discovering information incrementally without needing web access.

speakeasy agent --help speakeasy agent workflows speakeasy agent workflows sdk-generation

Each command returns focused information that agents can use to understand capabilities and construct the right commands.

Making commands non-interactive

Testing our skills with real agents revealed the gaps. Agents would start a workflow, hit an interactive prompt, and get stuck. We needed escape hatches.

The changes fell into a few categories:

Non-interactive flags bypass prompts entirely. When --skip-interactive is set, commands use sensible defaults or fail fast with clear error messages instead of waiting for input.

# Before: prompts for confirmation speakeasy quickstart # After: skips all prompts speakeasy quickstart --skip-interactive -s "user-management-api" -t typescript -n "MySDK" -p "my-sdk" --output console

Structured output gives agents machine-readable responses. The --non-interactive flag provides console output that agents can parse without navigating interactive prompts.

speakeasy validate openapi --non-interactive --schema openapi.yaml
INFO validation hint: [line 19786] missing-examples - Missing example for component. Consider adding an example WARN validation warn: [line 2337] duplicate-path-params - parameter "subject" has been specified multiple times in "DELETE /api/{serviceId}/client/authorization/delete/{clientId}" WARN validation warn: [line 14727] general - openapi validation warn: [14727:21] response.description is missing WARN validation warn: [line 15797] operation-operationId - the GET operation does not contain an operationId OpenAPI document linting complete. 0 errors, 4 warnings, 322 hints OpenAPI document valid with warnings ⚠

Machine-readable exit codes let agents determine success or failure without parsing output. Exit code 0 means success, non-zero codes indicate specific failure categories.

We didn’t remove the interactive experience for humans. We added flags that let agents opt out. The default behavior remains conversational and guided. The agent-friendly mode is explicitly requested. Skills tell agents upfront which flags exist and when to use them, reducing the back-and-forth turns an agent needs to get things right.

Output format considerations

Verbose output helps humans understand what’s happening. For agents, it burns context window tokens on information they don’t need.

We added output modes that reduce noise:

# Human mode: full output with formatting speakeasy run # Agent mode: minimal output, just results speakeasy run --quiet --output json

The --quiet flag suppresses progress indicators, spinners, and informational messages. Combined with --output json, agents get exactly the information they need: success/failure status and any relevant output paths or error details.

We also optimized the default output when agent flags are detected. If --non-interactive is set, we assume an automated context and reduce chattiness automatically.

Skill design lessons

Vercel’s research on agent skills  validated what we discovered through testing: focused skills outperform comprehensive ones.

Our first attempt was a meta-skill covering all SDK generation scenarios. It was comprehensive but vague about when to activate. Agents either triggered it too often (wasting context) or not at all (missing relevant guidance).

Breaking it into focused skills solved both problems:

  • “Generate SDK for Python” activates when the task mentions Python SDK generation
  • “Diagnose generation failure” activates when generation fails
  • “Customize SDK hooks” activates when adding custom behavior

Each skill has a specific trigger condition. Agents know exactly when to use it.

This approach also keeps individual skills small. A 500-line comprehensive guide competes poorly for context window space. A 50-line focused skill that solves one problem well is more likely to be read and applied correctly.

Measuring success with evals

We built evals to measure skill efficacy. The basic structure:

  1. Define a task: “Generate a Python SDK from this OpenAPI spec”
  2. Run the task with skills enabled
  3. Run the task with skills disabled
  4. Compare: completion rate, token usage, error recovery

Early results showed that skills significantly improved task completion, but also revealed which skills needed refinement. A skill that activated correctly but didn’t help complete the task was worse than no skill. It consumed context without providing value.

We now use eval results as input to skill design. If a skill has low completion rates, we examine why: Is the trigger condition wrong? Is the content unclear? Is the task actually multiple tasks that need separate skills?

This feedback loop drives continuous improvement. Skills aren’t static documentation, they’re living artifacts refined through measurement.

What’s next

Making a CLI agent-friendly is an ongoing process. Agents evolve, new platforms emerge, and usage patterns shift. We’re continuing to:

  • Add non-interactive modes to remaining commands
  • Refine skills based on eval results and user feedback
  • Expand the speakeasy agent subcommand with more progressive disclosure

The broader lesson: the tools we build for humans will increasingly be used by agents. Designing for both audiences isn’t optional anymore, it’s table stakes for developer tools.

If you’re building a CLI and want to make it agent-friendly, start with three changes:

  1. Add --non-interactive flags that bypass prompts
  2. Add --output json for structured responses
  3. Write skills that teach agents your tool’s patterns

The CLI you built for humans can work for agents too. It just needs a few escape hatches.


Try our agent skills: npx skills add speakeasy-api/skills

Read more about the skills release: Agent skills for OpenAPI and SDK generation

Last updated on

Build with
confidence.

Ship what's next.