API Advice
Type Safety in Python: Pydantic vs. Data Classes vs. Annotations vs. TypedDicts
Tristan Cartledge
August 29, 2024
Tip
A massive thank you to Sydney Runkle (opens in a new tab) from the Pydantic (opens in a new tab) team for her invaluable feedback and suggestions on this post!!
Python's dynamic typing is one of its greatest strengths. It is the language developers use to get things done without getting bogged down by type definitions and boilerplate code. When prototyping, you don't have time to think about unions, generics, or polymorphism - close your eyes, trust the interpreter to guess your variable's type, and then start working on the next feature.
That is, until your prototype takes off and your logs are littered with TypeError: 'NoneType' object is not iterable
or TypeError: unsupported operand type(s) for /: 'str' and 'int'
. You might blame the users for adding units in the amount field, or the frontend devs for posting null
instead of []
. So you fix the bug with another if
statement, a try
block, or the tenth validation function you've written this week. No time for reflection, just keep shipping, right? The ball of twine must grow.
We all know there is a better way. Python has had type annotations for years, and data classes and typed dictionaries allow us to document the shapes of the objects we expect.
Pydantic is the most comprehensive solution available to enforce type safety and data validation in Python, which is why we chose it for our SDKs at Speakeasy.
In this post we'll run through how we got to this conclusion. We'll detail the history of type safety in Python and explain the differences between: type annotations, data classes, TypedDicts, and finally, Pydantic.
If It Walks Like a Duck and It Quacks Like a Duck, Then It Must Be a Duck
Python is a duck-typed language (opens in a new tab). In a duck-typed language, an object's type is determined by its behavior at runtime, based on the parts of the object that are actually used. Duck-typing makes it easier to write generic code that works with different types of objects.
If your code expects a Duck
object to make it quack, Python doesn't care if the object is a Mallard
or a RubberDuck
. From Python's perspective, anything with a quack
method is a Duck
:
class Duck: def quack(self): print("Quack!")class Mallard: def quack(self): print("Quack!")def make_duck_quack(duck): duck.quack()make_duck_quack(Duck()) # prints "Quack!"make_duck_quack(Mallard()) # prints "Quack!"
This code runs without errors, even though make_duck_quack
expects a Duck
object in our mental model, and we pass it a Mallard
object. The Mallard
object has a quack
method, so it behaves like a Duck
object.
One of the reasons for Python's popularity is its flexibility. You can write generic and reusable code without worrying about the specific object types.
But this flexibility comes at a cost. If you pass the wrong type of object to a function you'll only find out at runtime, leading to bugs that are difficult to track down.
This was the motivation behind developing type annotations.
Type Annotations
Type annotations were introduced in Python 3.5 to add optional type hints to your code (PEP 484 (opens in a new tab)). Type hints can help you catch bugs while you are still writing your code by telling you when you pass the wrong type of object to a function.
TIP
To make the most of these type hints, many developers use type checkers. Type checkers are tools that analyze your Python code without running it, looking for potential type-related errors. One popular type checker is Pylance (opens in a new tab), a Visual Studio Code Extension that checks your Python code for type mismatches and shows you hints in your IDE.
If you're not using VS Code, Pyright (opens in a new tab) has similar functionality and can be run from the command line (opens in a new tab) or as an extension (opens in a new tab) to many text editors.
Here's how you can add type hints to the make_duck_quack
function:
class Duck: def quack(self): print("Quack!")class RubberDuck: def quack(self): print("Quack!")def make_duck_quack(duck: Duck): duck.quack()make_duck_quack(Duck()) # prints "Quack!"make_duck_quack(RubberDuck())# Pylance will show the hint: Argument 1 to "make_duck_quack" has incompatible type "RubberDuck"; expected "Duck".
Now, when you pass a RubberDuck
object to the make_duck_quack
function, your IDE hints that there's a type mismatch. Using annotations won't prevent you from running the code if there is a type mismatch, but it can help you catch bugs during development.
This covers type annotations for functions, but what about classes? We can use data classes to define a class with specific types for its fields.
Data Classes
Data classes were introduced in Python 3.7 (PEP 557 (opens in a new tab)) as a convenient way to create classes that are primarily used to store data. Data classes automatically generate special methods like __init__()
, __repr__()
, and __eq__()
, reducing boilerplate code. This feature aligns perfectly with our goal of making type-safe code easier to write.
By using data classes, we can define a class with specific types for its fields while writing less code than we would with a traditional class definition. Here's an example:
from dataclasses import dataclass@dataclassclass Duck: name: str age: int def quack(self): print(f"{self.name} says: Quack!")donald = Duck("Donald", 5)print(donald) # Duck(name='Donald', age=5)donald.quack() # Donald says: Quack!daffy = Duck("Daffy", "3")# Pylance will show the hint: Argument of type "Literal['3']" cannot be assigned to parameter "age" of type "int" in function "__init__".
We define a Duck
data class with two fields: name
and age
. When we create a new Duck
object and pass in values, the data class automatically generates an __init__()
method that initializes the object with these values.
In the data class definition, the type hints specify that the name
field should be a string and that age
should be an integer. If we create a Duck
object with the wrong data types, the IDE hints that there's a type mismatch in the __init__
method.
We get a level of type safety that wasn't there before, but at runtime, the data class still accepts any value for the fields, even if they don't match the type hints. Data classes make it convenient to define classes that store data, but they don't enforce type safety.
What if we're building an SDK and want to help users pass the right types of objects to functions? Using TypedDict
types can help with that.
TypedDict Types
Introduced in Python 3.8 (PEP 589 (opens in a new tab)), TypedDict
lets you define specific key and value types for dictionaries, making it particularly useful when working with JSON-like data structures:
from typing import TypedDictclass DuckStats(TypedDict): name: str age: int feather_count: intdef describe_duck(stats: DuckStats) -> str: return f"{stats['name']} is {stats['age']} years old and has {stats['feather_count']} feathers."print( describe_duck( { "name": "Donald", "age": 5, "feather_count": 3000, } ))# Output: Donald is 5 years old and has 3000 feathers.print( describe_duck( { "name": "Daffy", "age": "3", # Pylance will show the hint: Argument of type "Literal['3']" cannot be assigned to parameter "age" of type "int" in function "describe_duck" "feather_count": 5000, } ))
In this example, we define a DuckStats
TypedDict
with three keys: name
, age
, and feather_count
. The type hints in the TypedDict
definition specify that the name
key should have a string value, while the age
and feather_count
keys should have integer values.
When we pass a dictionary to the describe_duck
function, the IDE will show us a hint if there is a type mismatch in the dictionary values. This can help us catch bugs early and ensure that the data we are working with has the correct types.
While we now have type hints for dictionaries, data passed to our functions from the outside world are still unvalidated. Users can pass in the wrong types of values and we won't find out until runtime. This brings us to Pydantic.
Pydantic
Pydantic is a data validation library for Python that enforces type hints at runtime. It helps developers with the following:
- Data Validation: Pydantic ensures that data conforms to the defined types and constraints.
- Data Parsing: Pydantic can convert input data into the appropriate Python types.
- Serialization: Pydantic makes it easy to convert Python objects into JSON-compatible formats.
- Deserialization: It can transform JSON-like data into Python objects.
These Pydantic functionalities are particularly useful when working with APIs that send and receive JSON data, or when processing user inputs.
Here's how you can use Pydantic to define a data model for a duck:
from pydantic import BaseModel, Field, ValidationErrorclass Duck(BaseModel): name: str age: int = Field(gt=0) feather_count: int | None = Field(default=None, ge=0)# Correct initializationtry: duck = Duck(name="Donald", age=5, feather_count=3000) print(duck) # Duck(name='Donald', age=5, feather_count=3000)except ValidationError as e: print(f"Validation Error:\n{e}")# Faulty initializationtry: invalid_duck = Duck(name="Daffy", age=0, feather_count=-1) print(invalid_duck)except ValidationError as e: print(f"Validation Error:\n{e}")
In this example, we define a Duck
data model with three fields: name
, age
, and feather_count
. The name
field is required and should have a string value, while the age
and feather_count
fields are optional and should have integer values.
We use the Field
class from Pydantic to define additional constraints for the fields. For example, we specify that the age
field should be greater than or equal to zero, and the feather_count
field should be greater than or equal to zero, or None
.
In Python 3.10 and later, we can use the |
operator for union types (PEP 604 (opens in a new tab)), allowing us to write int | None
instead of Union[int, None]
.
When we try to create an invalid Duck
instance, Pydantic raises a ValidationError
. The error message is detailed and helpful:
Validation Error:2 validation errors for Duckage Input should be greater than 0 [type=greater_than, input_value=0, input_type=int] For further information visit https://errors.pydantic.dev/2.8/v/greater_thanfeather_count Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int] For further information visit https://errors.pydantic.dev/2.8/v/greater_than_equal
This error message clearly indicates which fields failed validation and why. It specifies that:
- The 'age' should be greater than 0, but we provided
0
. - The 'feather_count' should be greater than or equal to 0, but we provided
-1
.
Detailed error messages make it much easier to identify and fix data validation issues, especially when working with complex data structures or processing user inputs.
Simplifying Function Validation with Pydantic
While we've seen how Pydantic can validate data in models, it can also be used to validate function arguments directly. This can simplify our code while making it safer to run. Let's revisit our describe_duck
function using Pydantic's validate_call
decorator:
from pydantic import BaseModel, Field, validate_callclass DuckDescription(BaseModel): name: str age: int = Field(gt=0) feather_count: int = Field(gt=0)@validate_calldef describe_duck(duck: DuckDescription) -> str: return f"{duck.name} is {duck.age} years old and has {duck.feather_count} feathers."# Valid inputprint(describe_duck(DuckDescription(name="Donald", age=5, feather_count=3000)))# Output: Donald is 5 years old and has 3000 feathers.# Invalid inputtry: print(describe_duck(DuckDescription(name="Daffy", age=0, feather_count=-1)))except ValueError as e: print(f"Validation Error: {e}")# Validation Error: 2 validation errors for DuckDescription# age# Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]# For further information visit https://errors.pydantic.dev/2.8/v/greater_than# feather_count# Input should be greater than 0 [type=greater_than, input_value=-1, input_type=int]# For further information visit https://errors.pydantic.dev/2.8/v/greater_than
In this example, we made the following changes:
- We defined a
DuckDescription
Pydantic model to represent the expected structure and types of our duck data. - We used the
@validate_call
decorator on ourdescribe_duck
function. This decorator automatically validates the function's arguments based on the type annotations. - The function now expects a
DuckDescription
object instead of separate parameters. This ensures that all the data is validated as a unit before the function is called. - We simplified the function body since we can now be confident that the data is valid and of the correct type.
By using Pydantic's @validate_call
decorator, we made our function safer and easier to read.
Comparing Python Typing Methods
The table below summarizes the key differences between the Python typing methods we discussed. Keep in mind that some points may have exceptions or nuances depending on your specific use case. The table is meant to provide a general overview only.
Feature | Type Annotations | Data Classes | TypedDict | Pydantic |
---|---|---|---|---|
Static type checking | ✅ | ✅ | ✅ | ✅ |
Runtime type checking | ❌ | ❌ | ❌ | ✅ |
Automatic data validation | ❌ | ❌ | ❌ | ✅ |
JSON serialization | ❌ | ❌ | ❌ | ✅ |
Nested object support | ✅ | ✅ | ✅ | ✅ |
Custom validation rules | ❌ | ❌ | ❌ | ✅ |
IDE autocomplete support | ✅ | ✅ | ✅ | ✅ |
Performance overhead | None | Minimal | None | Minimal |
Compatibility with dicts | ❌ | ❌ | ✅ | ✅ |
Standard Library | ✅ | ✅ | ✅ | ❌ |
Why Speakeasy Chose Pydantic
At Speakeasy, we chose Pydantic as the primary tool for data validation and serialization in the Python SDKs we create.
After our initial Python release, support for Pydantic was one of the most requested features from our users. Pydantic provides a great balance between flexibility and type safety. And because Pydantic uses Rust under the hood, it has a negligible performance overhead compared to other third-party data validation libraries.
SDKs are an ideal use case for Pydantic, providing automatic data validation and serialization for the data structures that API users interact with.
By working with the Pydantic team, we've contributed to the development of features that make Pydantic even better suited for SDK development.
The Value of Runtime Type Safety
To illustrate the value of runtime type safety, consider a scenario where we are building an API that receives JSON data from a client to represent an order from a shop. Let's use a TypedDict
to define the shape of the order data:
from typing import TypedDictclass Order(TypedDict): customer_name: str quantity: int unit_price: floatdef calculate_order_total(order: Order) -> float: return order["quantity"] * order["unit_price"]print( calculate_order_total( { "customer_name": "Alex", "quantity": 10, "unit_price": 5, } )) # Output: 50
In this example, we define an Order
TypedDict
with three keys: customer_name
, quantity
, and unit_price
. We then create an order_data
dictionary with values for these keys and pass it to the calculate_order_total
function.
The calculate_order_total
function multiplies the quantity
and unit_price
values from the order
dictionary to calculate the total order amount. It works fine when the order_data
dictionary has the correct types of values, but what if the client sends us invalid data?
print( calculate_order_total( { "customer_name": "Sam", "quantity": 10, "unit_price": "5", } )) # Output: 5555555555
In this case, the client sends us a string value for the unit_price
key instead of a float. Since Python is a duck-typed language, the code will still run without errors, but the result will be incorrect. This is a common source of bugs in Python code, especially when working with JSON data from external sources.
Now, let's see how we can use Pydantic to define a data model for the order data and enforce type safety at runtime:
from pydantic import BaseModel, computed_fieldclass Order(BaseModel): customer_name: str quantity: int unit_price: float @computed_field def calculate_total(self) -> float: return self.quantity * self.unit_priceorder = Order( customer_name="Sam", quantity=10, unit_price="5",)print(order.calculate_total) # Output: 50.0
In this case, Pydantic converts the string "5"
to a float value of 5.0
for the unit_price
field. The automatic type coercion prevents errors and ensures the data is in the correct format.
Pydantic enforces type safety at runtime, but don't we lose the simplicity of passing dictionaries around?
But we don't have to give up on dictionaries.
Using Typed Dictionaries With Pydantic Models
In some cases, you may want to accept both TypedDict
and Pydantic models as input to your functions. You can achieve this by using a union type in your function signature:
from typing import TypedDictfrom pydantic import BaseModelclass OrderTypedDict(TypedDict): customer_name: str quantity: int unit_price: floatclass Order(BaseModel): customer_name: str quantity: int unit_price: floatdef calculate_order_total(order: Order | OrderTypedDict) -> float: if not isinstance(order, BaseModel): order = Order(**order) return order.quantity * order.unit_priceprint( calculate_order_total( { "customer_name": "Sam", "quantity": 10, "unit_price": "5", } )) # Output: 50.0
In this example, we define an OrderTypedDict
TypedDict
and an Order
Pydantic model for the order data. We then define a calculate_order_total
function to accept a union type of Order
and OrderTypedDict
.
If the input is a TypedDict
, it'll be converted to a Pydantic model before performing the calculation. Now our function can accept both TypedDict
and Pydantic models as input, providing us flexibility while still enforcing type safety at runtime.
Speakeasy SDKs employ this pattern so users can pass in either dictionaries or Pydantic models to the SDK functions, reducing the friction of using the SDK while maintaining type safety.
Conclusion
To learn more about how we use Pydantic in our SDKs, see our post about Python Generation with Async & Pydantic Support.