SDKs
Building APIs (and SDKs) that never break
Daniel Kovacs
December 13, 2025 - 19 min read
Building APIs (and SDKs) that never break
We’ve all seen websites that suddenly stop working for seemingly no reason. Mobile apps that load infinitely. Smart fridges that stop… well, fridging. In early 2025 I shipped a new screen that highlighted the amazing features available in Monzo’s business subscription offering. Except - for some people using iOS - it didn’t. Turns out the server-driven view framework didn’t support a certain type of image view on a specific range of iOS app versions. This meant the entire feature table was just empty.
Shipping changes constantly, with high confidence is a quintessential component of developer velocity and business reputation.
Annoyingly, APIs don’t just break when you change the contract. They break when you change behaviour that somebody depended on.
If you publish an SDK or an OpenAPI spec (which you definitely should), you’ve made the problem harder, not easier. Because now you don’t just have “the API”. You have:
- the API behaviour
- the OpenAPI spec people generate clients from
- the SDK behaviour and its runtime validation
- customers’ code written against all of the above
Those layers drift. Constantly. Simply put: SDK users can get a 200 OK from the server and still see an SDK error, often due to API evolution or spec drift combined with strict client-side validation.
So let’s talk about how to build APIs (and SDKs) that don’t turn every change into an incident.
Why API versioning matters
APIs are contracts. The moment you expose an endpoint to consumers — whether those consumers are third-party developers, mobile apps, or internal services — you’ve made a promise about how that endpoint behaves. Breaking that promise has consequences.
That matters even more when you have clients you don’t control:
- Mobile apps versions lag behind. Especially if automatic updates are disabled.
- Customers have release cycles, change control, procurement, security reviews.
- Some integrations are “set and forget” until they fail. When was the last time you reviewed your Slack bot implementations?
If you maintain an SDK or a public OpenAPI spec, your “real API” is bigger than your HTTP surface area.
The goal isn’t to avoid change—that’s impossible. The goal is to evolve your API while giving consumers a clear, predictable path forward.
So the core problem is drift:
- API evolves, spec lags
- spec changes, SDK lags
- SDK changes, customers lag
- customers do weird things you never anticipated
Which brings us to the fun part: how exactly things break.
Types of breakages
I’ll use Go for backend examples and TypeScript for client examples, but these concepts are language-agnostic.
Removing a property
This is the classic “we’re cleaning up the response shape”.
type Payee struct {
ID string `json:"id"`
Name string `json:"name"`
- ViewDetails *ViewDetails `json:"view_details,omitempty"`
}
- type ViewDetails struct {
- URL string `json:"url"`
- }Client break (TypeScript)
const payee = await client.payees.get("p_123"); // worked yesterday
window.location.href = payee.viewDetails.url;Typical runtime result:
TypeError: Cannot read properties of undefined (reading 'url')The field may have already been optional, but if you taught your users to expect it, they will.
Adding a new enum variant
This one catches teams off guard because it feels additive.
GitHub’s GraphQL docs make an important distinction: adding an enum value is often a dangerous change — it might not break queries, but it can break client runtime behaviour.
{
- "status": "created"
+ "status": "paused"
}A lot of SDKs (handwritten or generated) validate enums on deserialisation:
import * as z from "zod";
const PayeeSchema = z.object({
// ...
status: z.union(["created", "active"]),
});
const payees = {
async get(id: string) {
const result = await fetch(`/payees/${id}`);
const json = await result.json();
// new "paused" status will cause a failure here
return PayeeSchema.parse(json);
},
};Now "paused" bricks the whole response.
Even if our SDK didn’t validate the response, or used a forward-compatible enum strategy, our users could still have perfectly compiling code that breaks at runtime.
const badgesByPayeeStatus: Record<Payee["status"], string> = {
created: "badge-neutral",
active: "badge-active",
};
const payee = await sdk.payees.get(payeeId);
const badge = badgesByPayeeStatus[payee.status]; // type is 'string', but could be undefinedVersioning on its own may not be enough to fully mitigate this case. If you anticipate your API evolving in this dimension, I recommend starting with a fallback value, such as "unknown" and defaulting to that in SDKs or transforming unexpected variants to the fallback variant, based on client version.
Renaming a property
This is “let’s improve consistency” — and it’s unavoidably breaking unless you support both names.
{
- "date_of_birth": "1990-01-01"
+ "dob": "1990-01-01"
}const dob = new Date(payee.dateOfBirth); // now undefinedOr worse:
const year = payee.dateOfBirth.slice(0, 4);
// TypeError: Cannot read properties of undefined (reading 'slice')Who would blindly rename a field like that – you might ask. How about renaming less obvious parts of the response, e.g.: headers or file extensions. How about simply renaming a file extension from yml to yaml? Nobody depends on those, right? Wrong. Introducing: Hyrum’s Law…
Hyrum’s Law: unexpected failures
Put succinctly, Hyrum’s Law is:
With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviours of your system will be depended on by somebody.
This is the bit that makes “safe changes” unsafe.
Innocent change: adding a new property
You add nickname to a response. You didn’t remove anything. What could go wrong?
Client-side strict schema validation:
import { z } from "zod";
const PayeeSchema = z
.object({
id: z.string(),
name: z.string(),
})
.strict(); // rejects unknown keys
const payee = PayeeSchema.parse(apiResponse);Now nickname causes the parse to throw.
Stripe’s versioning scheme explicitly lists “adding new properties” and even “changing the order of properties” as backward-compatible changes — but that compatibility assumes clients don’t do brittle validation.
A technique to combat uncontractual expectations is response scrambling. This is when you intentionally introduce variance to parts of your API you consider unstable or not guaranteed.
E.g.: Stop users from building reliance on sort order by shuffling arrays in response.
transactions := transactionService.Read(ctx)
if req.Sort == "" {
// NOTE: order of transactions is not guaranteed, unless req.Sort is specified
transactions = sliceutil.Shuffle(transactions)
}Innocent change: field order changes (tuples / compact formats)
Most JSON objects are unordered, but plenty of APIs return compact array-based payloads for performance:
{
- "columns": ["id", "name"],
+ "columns": ["name", "id"],
- "rows": [["p_123", "Daniel"]]
+ "rows": [["Daniel", "p_123"]]
}Client code that maps by index:
const [id, name] = rows[0]; // now swappedNot a crash — just corrupted data. Arguably, this silent failure is worse than a crash, because it likely won’t show up in logs or monitoring.
Innocent change: new field exists, but the SDK strips it
This is a subtle one and it’s infuriating as a customer.
- API ships a new field today
- you want to use it today
- SDK won’t let you, because it strips unknown properties (or validates strictly)
- SDK release comes later
- your feature is blocked by someone else’s release cadence
A simplified SDK deserialiser:
type Payee = {
id: string;
name: string;
};
function decodePayee(raw: any): Payee {
// drops everything unknown
return {
id: raw.id,
name: raw.name,
};
}So even though the wire response includes viewDetails, the SDK won’t expose it.
My colleague, David wrote an entire article on this class of problem: strict client validation + API evolution = confusing breakages where the server “succeeds” but the SDK still errors. Read more about it here
Versioning approaches
There’s no single correct strategy. There are serious trade-offs to each approach. I recommend reading through the pros and cons and devising your own strategy, based on your needs. Not sure what’s best for you product, or need help setting up automations? Reach out! I love chatting about all things API design.
How Stripe does versioning
Stripe’s 2017 write-up
Conceptually:
- There is a “current” schema
- Older versions are produced by applying a set of transformations
- A request specifies which version it wants (often defaulting to an account-pinned version)
Stripe also exposes version overrides via the Stripe-Version header.
And more recently, Stripe introduced a release cadence with named major releases (with breaking changes) and monthly backward-compatible releases.
What it looks like in practice
Stripe uses Ruby for most of their backend services. Because all backend examples are in Go and to prove that this pattern is simply reproducible in the programming language of your choice, I re-wrote the examples from Stripe’s blog post in Go.
type APIVersion string
const (
V2024_01_01 APIVersion = "2024-01-01"
V2024_06_01 APIVersion = "2024-06-01"
)
type PayeeCanonical struct {
ID string `json:"id"`
Name string `json:"name"`
DateOfBirth string `json:"date_of_birth"` // canonical
ViewDetails *ViewDetails `json:"view_details"`
}
type Transformer func(map[string]any) map[string]any
func renameField(from, to string) Transformer {
return func(m map[string]any) map[string]any {
if v, ok := m[from]; ok {
m[to] = v
delete(m, from)
}
return m
}
}
func dropField(field string) Transformer {
return func(m map[string]any) map[string]any {
delete(m, field)
return m
}
}
var versionTransforms = map[APIVersion][]Transformer{
V2024_06_01: {
// current: no transforms
},
V2024_01_01: {
// older clients expect dob, not date_of_birth
renameField("date_of_birth", "dob"),
// older clients do not expect view_details
dropField("view_details"),
},
}
func render(version APIVersion, canonical PayeeCanonical) ([]byte, error) {
// marshal canonical -> map so we can transform
raw, _ := json.Marshal(canonical)
var m map[string]any
if err := json.Unmarshal(raw, &m); err != nil {
return nil, err
}
for _, t := range versionTransforms[version] {
m = t(m)
}
return json.Marshal(m)
}The point is: backward compatibility costs ongoing effort. Brandur explicitly frames versioning as a compromise between DX improvements and the burden of maintaining old versions.
Pros
- It’s lightweight. Upgrade friction is minimal. Each version contains an incremental set of changes rather than a massive rewrite.
- Versioning is integrated deeply into tooling and documentation.
- Old versions are tightly encapsulated. The happy-path (current version) is the default, backwards compatibility is the bolt-on feature.
Cons
- Current-backward model requires engineers to implement all changes twice: forward and back.
- Limited to changes that can be expressed in transformations.
- Side-effects don’t have first class support.
TypeScript SDK: letting users pin a version
In “Stripe-like” ecosystems, you usually see something like:
const sdk = new Client({
apiVersion: "2024-06-01",
});or version supplied per request / per client instance, often mapped to a header.
That’s important because it makes version choice explicit and testable, instead of implicit magic.
Monzo’s versioning
Before Speakeasy, I worked for Monzo, UK’s most popular neobank. Building APIs for mobile apps means versioning was non-negotiable.
Apps are hard to update reliably. iOS in particular can lag for months in the real world: old devices, disabled updates, people on holiday, App Store review delays, the lot. A sizeable percentage of our users had auto-updates disabled.
Monzo uses explicit server-side app version checks to shape responses based on client version, because you cannot assume everyone upgraded.
A simplified illustration of that approach:
// DISCLAIMER: not actual Monzo code - for illustration purposes only
type Payee struct {
Name string `json:"name"`
DateOfBirth time.Time `json:"date_of_birth"`
Status PayeeStatus `json:"status"`
// ViewDetails is only handled by iOS 13.3.0+ and Android 12.3.5+
ViewDetails *ViewDetails `json:"view_details,omitempty"`
}
func PayeeHandler(ctx context.Context) (*Payee, error) {
payee := Payee{
Name: "Daniel",
DateOfBirth: time.Date(1990, 1, 1, 0, 0, 0, 0, time.UTC),
Status: PayeeStatusCreated,
}
// Version check determines feature availability
if version.FromCtx(ctx).GreaterOrEqual(
cond.Or(version.iOS(13, 3, 0), version.Android(12, 3, 5)),
) {
var err error
payee.ViewDetails, err = buildExpensiveViewDetailsForNewApps(ctx)
if err != nil {
return nil, err
}
}
return &payee, nil
}The idea is straightforward: you treat “client version” as an input to response rendering.
But there are caveats. In practice, you hit questions like:
-
Feature flags and experiments
- Do you let your feature flag platform handle all targeting (including version checks)?
- Or do you keep version checks in code and only call the flag system when the client is capable?
Feature flag products (Statsig is one example) exist specifically to toggle behaviour without redeploying code.
The tradeoff is operational simplicity vs performance/clarity vs “how many moving parts are involved in a rollout”. -
Version → feature mapping drift
- If your code has scattered
if version >= Xchecks, nobody can answer “which versions support which features” without grep. - Eventually you get a matrix of doom: version checks, platform checks, experiments, entitlement checks.
- If your code has scattered
-
Async contexts
- Code triggered by an API request has client context.
- Background jobs often don’t.
- If your fallback is “last known version”, you’re depending on a state that can be stale or missing.
This strategy works well when you control most clients (mobile, first-party apps). It’s harder to scale cleanly to public APIs, where you need explicit contracts and explicit lifecycles.
Pros
- Client-driven: uses a well-known property of the client (app build version). No additional work is required on the client-side (other than forwarding the version).
- Versioned by default. Every breaking change will have an
ifstatement and a version check associated with it. - Easy to support with test tooling, versioning issues are often caught by unit tests.
Cons
- Not easy to extend to more consumers. New clients, e.g.:
webwon’t support the latest version by default, unless explicitly updated. - Relationship between versions and feature support becomes tribal knowledge. There’s no way to associate a version with a feature set at a glance.
- Makes it non-trivial to re-use logic across request handlers and other logic, e.g.: in stream consumers, the context wouldn’t hold an app version.
Other strategies
Version prefix
The classic:
GET /api/v1/users/123GET /api/v2/users/123
Pros
- Easy to explain.
- Tooling-friendly (OpenAPI, gateways, routing).
- Clear separation.
Cons
- It turns versioning into a heavyweight decision: “is this a v2 moment?”
- Teams delay necessary changes because they fear “the v2 project”.
- Or they do the opposite:
/v1212/and customers lose confidence.
Minimal Go routing example:
http.HandleFunc("/api/v1/users/", v1UserHandler)
http.HandleFunc("/api/v2/users/", v2UserHandler)TypeScript migration path often ends up like:
import { Client as V1 } from "@acme/sdk-v1";
import { Client as V2 } from "@acme/sdk-v2";Which is clean… until your customer wants to use both because they’re migrating gradually.
Variant: resource-based versioning
/api/users/v2/... can be a pragmatic compromise when only one resource is being redesigned.
But it can also create a patchwork API where every resource has its own version story.
Opt-in features (client-side feature flagging)
Instead of “pick a version”, clients say “I want feature X”.
For example:
X-Features: view_details, paused_status
Pros
- Maximum flexibility.
- Clients can pick and mix.
- Makes rollouts very transparent.
Cons
- You push cognitive load onto customers: they must know which knobs to set.
- You end up maintaining feature negotiation logic forever.
Go example:
func hasFeature(r *http.Request, f string) bool {
raw := r.Header.Get("X-Features")
for _, part := range strings.Split(raw, ",") {
if strings.TrimSpace(part) == f {
return true
}
}
return false
}
if hasFeature(r, "view_details") {
// include new field
}TypeScript fetch example:
await fetch("/payees/p_123", {
headers: { "X-Features": "view_details" },
});And if you’re shipping an SDK, you can make this nicer:
const sdk = new Client({ features: ["view_details"] });Hybrid approach: version → feature resolution layer → feature-driven backend logic
Combine explicit versions with internal feature flags. This is the “separation of concerns” approach:
- Resolve a client property, e.g.:
versioninto a feature set - Backend code asks “is feature X enabled?” not “is version ≥ 12.3.0?”
// Version resolution layer
func resolveFeatures(version string) []string {
switch {
case version >= "2024-11-01":
return []string{"enhanced-profiles", "new-permissions", "v2-pagination"}
case version >= "2024-06-01":
return []string{"enhanced-profiles", "new-permissions"}
case version >= "2024-01-01":
return []string{"enhanced-profiles"}
default:
return []string{}
}
}
// Feature table at a glance
// Version | enhanced-profiles | new-permissions | v2-pagination
// 2024-01-01 | ✓ | |
// 2024-06-01 | ✓ | ✓ |
// 2024-11-01 | ✓ | ✓ | ✓Pros
- Backend stays readable. It only deals with features, not versions
- The resolution layer documents version evolution at a glance
- Easier to test. Features names are easier to reason about, so tests become more readable.
Cons
- More infrastructure to maintain
- Changes may require deploying multiple services
- The mapping layer becomes critical path
This is the strategy I usually prefer for public APIs because it makes compatibility an explicit subsystem, not an ad-hoc habit.
Immutable deployments + dynamic routing
This is the “every version lives forever” strategy.
The simplest mental model is: a version is a deployment, not a branch of code.
Vercel is a good reference point here: they explicitly describe each push producing a new unique URL and a new immutable deployment
They also document that generated deployment URLs remain accessible based on your retention policy. Use immutable deployments as a foundation for safe rollout strategies like blue/green.
# Each version maps to a distinct deployment
api-v1.example.com -> deployment-abc123
api-v2.example.com -> deployment-def456
api-v3.example.com -> deployment-ghi789If you apply that idea to APIs, you get:
deployment_id(orcommit_sha) becomes the version key- routing layer maps that key to the correct deployment
Pros
- No “compatibility code” inside a deployment — each one behaves consistently.
- SDK versions can align 1:1 with API versions.
- Rollbacks are easy (just route traffic differently).
Cons
- Security patching becomes painful. If old deployments still exist, you need a policy for patching or killing them.
- You risk giving customers no incentive to upgrade.
- Database schema changes become complex (which schema version does each deployment use?)
- Shared resources (queues, caches) need to handle multiple API versions simultaneously
If you go down this route, borrow from immutable infrastructure thinking: AWS describes immutable infrastructure
The “shove it under the rug” approach
A translation layer sits in front:
- requests come in
- layer transforms them to what the backend understands
- responses get transformed back to what the client expects
This can be code-driven, config-driven, or (increasingly) “AI-driven”.
A simple architecture sketch:
Pros
- Mostly plug and play – and it’s just getting easier with AI-driven translation layers.
- You can use this to build a “compatibility layer” for legacy APIs or third-party APIs.
Cons
- If you’re hosting your own compatibility layer, or building your own translation layer, you’re adding a lot of complexity to your architecture.
- If you’re using a third-party compatibility layer, you’re adding cost and a new critical point of failure.
- Some transformations cannot be expressed in a static configuration.
- Can’t express side-effects or conditional logic.
My take: this is a reasonable last resort when you have low internal discipline or a messy legacy API. But if you have the chance to build a principled versioning model, do that instead.
How to send version info
You’ve basically got four knobs. None are perfect.
| Mechanism | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/users | Visible, cache-friendly, easy routing | Version becomes “part of the URL”, hard to evolve gradually |
| Header (dedicated) | X-Api-Version: 2024-06-01 | Explicit, easy to test, plays well with one URL | Some tooling hides headers; needs docs discipline |
| Header (Accept) | Accept: application/vnd.acme.v2+json | Uses content negotiation semantics | Verbose, annoying in browsers, harder SDK ergonomics |
| Query param | ?api_version=2024-06-01 | Easy to try manually | Often abused, sometimes cached badly, feels less “contractual” |
If I’m building a serious public API today, I default to a dedicated header. It’s explicit without turning versioning into URL sprawl.
GitHub’s REST API uses this exact pattern with X-GitHub-Api-Version and date-based versions. Stripe does the same conceptually with Stripe-Version. That’s a pretty good signal that the ergonomics work at scale.
How to sunset and deprecate
Versioning without a lifecycle is just hoarding.
You need:
- monitoring (who is still using old versions?)
- deprecation signals (tell them early)
- a sunset plan (turn it off eventually, or you’ll maintain it forever)
- communication (docs, changelogs, email, dashboard banners)
Use standard headers where possible
There are now RFCs for this.
- Deprecation header (RFC 9745) communicates that a resource is (or will be) deprecated, and it carries a deprecation date. It’s a structured header; the RFC example uses a Unix timestamp format like
@1688169599. - Sunset header (RFC 8594) communicates when a resource is expected to become unavailable (HTTP-date format).
Example response headers:
Deprecation: @1767139200
Sunset: Wed, 31 Dec 2025 00:00:00 GMT
Link: <https://developer.example.com/deprecation>; rel="deprecation"; type="text/html"The RFC also calls out an important constraint: sunset shouldn’t be earlier than deprecation.
Have a process, not a hope
A process sketch:
And you need a support window.
GitHub commits to supporting the previous REST API version for at least 24 months after a new one is released. That’s a clear promise customers can plan around — and it forces internal discipline.
How many breaking changes do you ship in a year? How about 2 years? Do you have the bandwidth to support all of those versions?
How we help solve these issues at Speakeasy
At Speakeasy, we help some of the most popular software platforms generate world-class SDKs that work seamlessly. Versioning, forward and backward compatible changes are always top-of-mind. You can read more about how we implement forward compatibility here
SDK behaviour drift
David’s TypeScript forward-compatibility article 200 OK, SDK throws anyway. Causes include API evolution and inaccurate OpenAPI specs, and strict enum/union/required-field validation is a common trigger.
The response is to build SDKs that degrade gracefully:
- forward-compatible enums (accept unknown values in a type-safe way)
- forward-compatible unions
- “lax mode” for missing/mistyped fields
- smarter union deserialisation strategies
There’s a difference between:
- “SDK is a strict contract enforcer”
- “SDK is a resilient integration tool”
Most customers want the second. We support both.
Explicit versioning and metadata propagation
We use a hash of your full spec to track changes across versions, so you’re not required to explicitly update your version to communicate changes. However, I do recommend versioning your OpenAPI spec.
This allows you to make that version visible at runtime.
In Speakeasy SDKs, you can use hooks to inject cross-cutting behaviour
A practical pattern is:
- send OpenAPI doc version on every request
- send SDK version on every request
- use those values for feature negotiation and observability
Illustrative TypeScript hook:
import { SDK_METADATA } from "../lib/config.js";
import { Hooks } from "./types.js";
export function initHooks(hooks: Hooks) {
hooks.registerBeforeRequestHook({
beforeRequest(_, request) {
request.headers.set(
"x-openapi-doc-version",
SDK_METADATA.openapiDocVersion,
);
request.headers.set("x-sdk-version", SDK_METADATA.sdkVersion);
return request;
},
});
}Why this helps:
- OpenAPI doc version can map to feature sets (server-side) without guessing.
- SDK version distribution tells you who’s stuck, who upgrades, and which customers will be hurt by a breaking change.
func trackVersionMetrics(r *http.Request) {
sdkVersion := r.Header.Get("x-sdk-version")
openapiVersion := r.Header.Get("x-openapi-doc-version")
// NOTE: you should validate the versions, so malicious or malfunctioning clients
// don't bork your metrics system with high-cardinality columns
metrics.Increment("api.requests", map[string]string{
"sdk_version": sdkVersion,
"openapi_version": openapiVersion,
})
}You can’t manage what you can’t see.
Maintaining backwards compatibility with continuous testing
If you’re not already doing this, don’t worry. This is where most teams are weakest. They “try not to break stuff”, but they don’t continuously prove it.
One solid approach is workflow-based end-to-end tests.
If you haven’t already, I recommend reading Brian’s article on Arazzo-based E2E testing
What makes this powerful, is that it’s testing the things customers actually do.
A practical model:
- Keep historical versions of your OpenAPI spec (git tags are fine).
- Keep Arazzo workflows for your critical customer journeys.
- On every deploy (or nightly):
- check out older spec versions
- run the workflows against your current API
- fail fast if you broke an older contract
This turns backward compatibility from painful chore into an executable guarantee.
Core principles I’d actually follow
If you only remember a handful of things, make it these:
- Define the stable core of your API Be explicit about what’s truly contract, and what’s best-effort.
- Treat “additive” changes as dangerous unless you’ve designed for resilience Adding an enum value is dangerous (GitHub says so). Adding new fields is “backward-compatible” only if clients don’t validate strictly (Stripe’s docs assume this).
- Make changes opt-in where you can New fields, new behaviours, new defaults — opt-in beats surprise.
- Versioning strategy is part of your interface forever Pick something you’re willing to support for years, not months.
- Have an explicit deprecation and sunset policy Use runtime signals like Deprecation/Sunset headers, and back them with real comms.
- Harden observability around versions and features You need to know who is on what, or you’re flying blind.
- Automate backwards compatibility testing Prefer workflow-level tests (Arazzo-style) over “unit tests of handlers”, because customers don’t call endpoints in isolation.
And the meta-rule, courtesy of Hyrum’s Law: assume customers depend on everything you didn’t mean to expose.
Once you accept it, you stop being surprised by breakages — and you start designing so they don’t happen in the first place.