Engineering

Under the Hood: Building a High-Performance OpenAPI Parser in Go

Tristan Cartledge

December 12, 2025 - 12 min read

Engineering

At Speakeasy, we process thousands of OpenAPI specifications every day across a wide range of teams and industries. These specs drive our production-ready SDKs, Terraform providers, and a growing set of internal tools.

That volume exposes every sharp edge of OpenAPI. From small hand-written specs to multi-megabyte machine-generated schemas with deeply nested references and vendor extensions, we’ve seen almost every way an OpenAPI document can go wrong.

As our platform grew, we hit the limits of existing Go libraries. Some were fast but modeled the spec loosely, making it hard to build correct tooling on top. Others were closer to the spec but used untyped maps everywhere, which made large refactors and static analysis painful.

We needed something different: a library that could be both a precise model of the specification and a high-performance engine for reading, validating, mutating, and transforming specs at scale.

So we built our own. Today, we’re introducing github.com/speakeasy-api/openapi, a comprehensive set of packages and tools for working with OpenAPI, Swagger, Arazzo, and Overlay Specification documents.

While our release post covers the high-level features, this article focuses on the engineering: the core abstractions, performance decisions, and tradeoffs that make this library a strong foundation for any OpenAPI tooling written in Go. If you’re choosing an OpenAPI library for Go today, our goal is that this post gives you enough signal to make this your default choice.

The Challenge: OpenAPI is Hard

OpenAPI is a deceptively complex specification. It has evolved significantly over time (2.0 to 3.2), supports dynamic types (fields that can be a string or an array), and relies heavily on JSON Schema, which allows for recursive and circular references.

Existing Go libraries often struggle with these edge cases, either by simplifying the model (losing accuracy) or by exposing a raw, untyped map structure (losing type safety). We needed a solution that offered both correctness and developer experience.

In practice, these tradeoffs show up as:

Tools that silently drop parts of the spec when they encounter constructs they don’t model.
Validators that can’t follow complex or circular $ref graphs and either blow up or behave inconsistently.
Libraries that are easy to start with, but brittle to extend when a new OpenAPI minor version or companion spec (like Arazzo or Overlays) appears.

Our goal for this library was to make those classes of bugs structurally harder to introduce by encoding more of the spec’s rules and invariants directly into the type system and core architecture.

Architecture: The Reflection-Based Marshaller

One of the core design decisions we made was to build the library on top of a custom, reflection-based marshaller.

In many libraries, the logic for parsing JSON/YAML is tightly coupled with the struct definitions. This makes it hard to support multiple specification versions or new specs like Arazzo without duplicating a lot of boilerplate code.

Our approach separates the model definition from the deserialization logic. We define our Go structs to match the specification as closely as possible, and our marshaller handles the complexity of mapping the input data to these structs. This allows us to:

Iterate fast: We can add support for new specs (like we recently did with Swagger 2.0) by simply defining the structs, without writing bespoke parsing logic.
Optimize centrally: Performance improvements in the marshaller benefit all supported specifications immediately.

Under the hood, the marshaller walks a graph of Node values (an internal intermediate representation of the raw YAML/JSON) produced from the original YAML/JSON document. Instead of binding directly to concrete structs at parse time, we keep a lightweight intermediate representation that preserves:

The original shape of dynamic fields (single value vs array, inline vs $ref).
Location information that we can use in validation errors.
Enough metadata to support additional specifications without rewriting the core.

When we finally bind into Go structs, we do so using a set of small, reusable reflection helpers that know how to:

Map OpenAPI/JSON Schema primitives and unions into strongly typed fields.
Apply defaulting and normalization rules in one place.
Reuse the same code paths across OpenAPI 3.x, Swagger 2.0, Arazzo, and Overlays.

This architecture means that adding support for a new spec or version is mostly a matter of:

Defining new Go structs that closely mirror the specification.
Wiring them into the existing marshaller.

The heavy lifting—parsing, node traversal, defaulting, and error reporting—remains centralized and battle-tested.

Performance: Porcelain vs. Plumbing

To handle “thousands of specs” efficiently, we adopted a “Porcelain vs. Plumbing” API design.

Plumbing: The internal representation is optimized for efficient storage and iteration. We use well-defined conventions to standardize how data is stored, allowing us to minimize memory allocations during parsing.
Porcelain: The public API provides a clean, high-level interface for developers. You don’t need to worry about the internal storage details; you just interact with idiomatic Go structs.

This separation allows us to optimize the “hot paths” of serialization and deserialization without breaking the user-facing API.

On the plumbing side, we optimized for the workloads we see most often at Speakeasy: repeatedly parsing and transforming large specs as part of CI pipelines, code generation, and analysis tools.

Some of the concrete decisions we made here include:

Preferring stable internal representations that can be reused across passes (validation, traversal, mutation) rather than re-parsing.
Minimizing allocations in hot paths inside the marshaller and walker.
Designing APIs that compose naturally with Go’s concurrency primitives so you can fan out work across operations, paths, or components when it makes sense for your use case.

Because the public API is intentionally “porcelain,” these optimizations are mostly invisible to library consumers—but they matter when you’re processing thousands of specs or very large documents.

Taming Dynamic Types with Type Safety

One of the hardest parts of modeling OpenAPI in Go is the dynamic nature of the spec. For example, in OpenAPI 3.1, the type field of a schema can be a single string (e.g., "string") or an array of strings (e.g., ["string", "null"]).

In a statically typed language like Go, this is usually handled by using interface{} (which loses type safety) or complex pointer logic.

We introduced generic abstractions like EitherValue to handle these cases elegantly. For example, the Type field in our Schema struct is defined as:


// Type represents the type of a schema either an array of types or a single type.
Type = *values.EitherValue[[]SchemaType, []marshaller.Node[string], SchemaType, string]

This abstraction allows us to capture the exact state of the document—whether it was defined as a single value or an array—while still providing type-safe accessors to the underlying data.

Similarly, we use JSONSchema[Referenceable] to handle the complexity of JSON Schema references, ensuring that we can model both inline definitions and $ref pointers consistently.

The key benefit isn’t just that we can represent more of the spec faithfully—it’s that the representation is consistent. The same patterns we use for the type field also show up with other dynamic fields and referenceable structures.

That consistency makes the library predictable:

Once you know how to work with a value that may be a single item or a list, you can apply the same approach everywhere.
Tooling like IDEs and linters can understand your data flow because everything is strongly typed.
Refactors are safer because more invariants are enforced at compile time instead of being left to runtime checks or comments.

Reference Resolution and Validation at Scale

Correctly handling $ref pointers is one of the hardest parts of working with OpenAPI and JSON Schema in practice. Real-world specs frequently contain:

Deeply nested internal references.
References that cross between files.
Circular graphs that are valid but tricky to traverse safely.

The library’s reference resolution engine is built on a few principles:

Single source of truth for documents: We maintain a document graph in memory that tracks where each node came from (file, path, and location).
Stable identifiers: Every referenceable element can be addressed via a stable pointer, making it easy to traverse and manipulate the graph.
Separation of loading and validation: We can first build the document graph, then apply multiple passes of validation without reloading or reparsing.

This design lets us:

Resolve complex reference graphs without blowing the stack.
Emit useful error messages that point back to the exact location in the original document.
Compose operations like bundling, inlining, or overlay application on top of the same core engine.

For example, if a circular reference chain is invalid because a required property is missing deep in the graph, the error message still points back to the exact $ref and location in the original file where the problem originates.

Unified Ecosystem: Arazzo and Overlays

Because we built a flexible core, we were able to extend the library to support Arazzo (for workflows) and Overlays (for modifications) natively.

Crucially, these packages share the same underlying models for common elements like JSON Schema and references. This means you can parse an OpenAPI spec, apply an Overlay to it, and then validate it against an Arazzo workflow, all within the same memory space and using the same tooling.

We deliberately avoided creating separate, siloed models for each specification. Instead, Arazzo, Overlays, and OpenAPI all share a small set of core building blocks—JSON Schema, references, and common metadata structures.

That means investments in those shared pieces (better validation, richer error messages, performance improvements) automatically benefit the entire ecosystem. If a new spec builds on the same foundations, we can usually support it without re-architecting the library.

Key Benefits

If you’re working with OpenAPI in Go, here’s why you should consider using this library:

Full Version Support: It supports OpenAPI 3.0.x, 3.1.x, and 3.2.x, along with Swagger 2.0, Arazzo, and Overlays—all in one place.
Robust Reference Resolution: Handling $ref pointers correctly is notoriously difficult. Our library provides a robust reference resolution engine that handles circular references and external files with ease.
Idiomatic & Safe Go API: The object models match the structure of the specifications as closely as possible. We prioritized nil safety and high-level APIs to reduce the need for diving into low-level details.
Battle-Tested: This library powers the Speakeasy platform, meaning it’s tested against a vast array of real-world specifications.

Powerful CLI Tooling

Beyond the library, we provide a comprehensive CLI tool that exposes many of the library’s capabilities directly to your terminal. It’s packed with utilities to help you manage your API lifecycle:

bundle: Bundle external references into a single file.
inline: Inline all references to create a self-contained document.
overlay: Apply, compare, and validate OpenAPI Overlays to modify your specs without changing the source.
optimize: Deduplicate schemas and optimize your document structure.
sanitize: Remove unused components and clean up your spec.
snip: Extract specific operations or paths into a new document.
explore: Interactively explore your OpenAPI document in the terminal.

Importantly, the CLI is a thin layer over the same Go packages you use in code. Every subcommand is built from the same primitives: parsing, walking, reference resolution, and mutation of the in-memory document graph.

That means if you start by using the CLI for quick experiments—bundling, inlining, sanitizing—you can later pull the exact same operations into your own Go programs or CI pipelines with very little glue code.

Getting Started

Here are a few complete examples of how you can use the library to read, validate, mutate, and upgrade OpenAPI documents.

Reading and Validating

Reading a document is simple, and validation happens automatically by default.


package main
 
import (
	"context"
	"fmt"
	"os"
 
	"github.com/speakeasy-api/openapi/openapi"
)
 
func main() {
	ctx := context.Background()
 
	f, err := os.Open("openapi.yaml")
	if err != nil {
		panic(err)
	}
	defer f.Close()
 
	// Unmarshal and validate
	doc, validationErrs, err := openapi.Unmarshal(ctx, f)
	if err != nil {
		panic(err)
	}
 
	// Check for validation errors
	if len(validationErrs) > 0 {
		for _, err := range validationErrs {
			fmt.Println(err.Error())
		}
	}
 
	fmt.Printf("API Title: %s\n", doc.Info.Title)
}

Traversing with the Walker

The library provides a powerful iterator pattern to traverse the document, allowing you to inspect specific elements without writing complex recursive loops. This is useful for auditing all operations or programmatically curating a spec—the same walker that powers many of the CLI commands.


package main
 
import (
	"context"
	"fmt"
	"os"
 
	"github.com/speakeasy-api/openapi/jsonschema/oas3"
	"github.com/speakeasy-api/openapi/openapi"
)
 
func main() {
	ctx := context.Background()
 
	f, err := os.Open("openapi.yaml")
	if err != nil {
		panic(err)
	}
	defer f.Close()
 
	doc, _, err := openapi.Unmarshal(ctx, f)
	if err != nil {
		panic(err)
	}
 
	// Walk through the document
	for item := range openapi.Walk(ctx, doc) {
		err := item.Match(openapi.Matcher{
			Operation: func(op *openapi.Operation) error {
				if op.OperationID != nil {
					fmt.Printf("Found Operation: %s\n", *op.OperationID)
				}
				return nil
			},
			Schema: func(schema *oas3.JSONSchema[oas3.Referenceable]) error {
				if schema.IsSchema() {
					fmt.Printf("Found Schema\n")
				}
				return nil
			},
		})
		if err != nil {
			panic(err)
		}
	}
}

Mutating a Document

You can easily modify the document programmatically and marshal it back to YAML or JSON. This pattern is useful when you want to enforce organization-wide conventions—like injecting standard servers, headers, or tags—across many specs.


package main
 
import (
	"bytes"
	"context"
	"fmt"
	"os"
 
	"github.com/speakeasy-api/openapi/openapi"
	"github.com/speakeasy-api/openapi/pointer"
)
 
func main() {
	ctx := context.Background()
 
	f, err := os.Open("openapi.yaml")
	if err != nil {
		panic(err)
	}
	defer f.Close()
 
	doc, _, err := openapi.Unmarshal(ctx, f)
	if err != nil {
		panic(err)
	}
 
	// Modify the title
	doc.Info.Title = "Updated API Title"
 
	// Add a new server
	doc.Servers = append(doc.Servers, &openapi.Server{
		URL:         "https://api.example.com/v2",
		Description: pointer.From("New Production Server"),
	})
 
	// Write back to YAML
	buf := bytes.NewBuffer([]byte{})
	if err := openapi.Marshal(ctx, doc, buf); err != nil {
		panic(err)
	}
	fmt.Println(buf.String())
}

Upgrading to OpenAPI 3.2.0

One of the most powerful features is the ability to automatically upgrade older specs to the latest version. This pattern works well in CI pipelines where you want to accept older specs (3.0.x or 3.1.x) at the edge, but standardize everything internally on 3.2.0 before running validation, code generation, or analysis.


package main
 
import (
	"context"
	"fmt"
	"os"
 
	"github.com/speakeasy-api/openapi/openapi"
)
 
func main() {
	ctx := context.Background()
 
	f, err := os.Open("openapi.yaml")
	if err != nil {
		panic(err)
	}
	defer f.Close()
 
	doc, _, err := openapi.Unmarshal(ctx, f)
	if err != nil {
		panic(err)
	}
 
	// Upgrade from 3.0.x or 3.1.x to 3.2.0
	upgraded, err := openapi.Upgrade(ctx, doc)
	if err != nil {
		panic(err)
	}
 
	if upgraded {
		fmt.Printf("Upgraded to version: %s\n", doc.OpenAPI)
	}
}

Conclusion

If you’re building serious OpenAPI tooling in Go—linters, documentation generators, gateways, test harnesses, or CI checks—our goal is for this to be the library you reach for first. We’ve invested heavily in correctness, type safety, and performance because we rely on it in production every day, and we’re committed to evolving it alongside the ecosystem.

Check out the code on GitHub and let us know what you think!

Last updated on December 13, 2025