Rust Patterns That Matter #18: Typestate
Post 18 of 22 in Rust Patterns That Matter. Companion series: Building a Chat Server in Rust.
Previous: #17: Builder Pattern | Next: #19: Arc<Mutex<T>>
You have a connection that goes through states: disconnected, connecting, connected. You can only send data when connected. The obvious implementation checks the state at runtime and panics or returns an error on invalid transitions. The typestate pattern makes invalid transitions a compile error instead.
The motivation
enum State { Disconnected, Connecting, Connected }
struct Connection {
state: State,
}
impl Connection {
fn send(&self, data: &[u8]) -> Result<(), Error> {
match self.state {
State::Connected => { /* actually send */ Ok(()) }
_ => Err(Error::new("not connected")),
}
}
}
The problem: every method that depends on state needs a runtime check. Forget a
check and you have a bug. The type system isn't helping - from the compiler's
perspective, a Connection is always the same type regardless of its
state.
The pattern: states as types
Encode each state as a separate zero-sized type. Make Connection
generic over the state. Transition methods consume the current state and return a
new one:
// State markers — zero-sized, exist only in the type system
struct Disconnected;
struct Connecting;
struct Connected;
struct Connection<S> {
addr: String,
_state: std::marker::PhantomData<S>,
}
impl Connection<Disconnected> {
fn new(addr: String) -> Self {
Connection { addr, _state: std::marker::PhantomData }
}
fn connect(self) -> Connection<Connecting> {
println!("connecting to {}...", self.addr);
Connection { addr: self.addr, _state: std::marker::PhantomData }
}
}
impl Connection<Connecting> {
fn on_connected(self) -> Connection<Connected> {
println!("connected!");
Connection { addr: self.addr, _state: std::marker::PhantomData }
}
}
impl Connection<Connected> {
fn send(&self, data: &[u8]) {
println!("sending {} bytes", data.len());
}
fn disconnect(self) -> Connection<Disconnected> {
println!("disconnected");
Connection { addr: self.addr, _state: std::marker::PhantomData }
}
}
The send method only exists on Connection<Connected>.
Calling it on a disconnected connection doesn't produce a runtime error - it
doesn't compile:
let conn = Connection::new("127.0.0.1".into());
// conn.send(b"hello"); // ERROR: no method named `send` found
// // for `Connection<Disconnected>`
let conn = conn.connect(); // Connection<Connecting>
let conn = conn.on_connected(); // Connection<Connected>
conn.send(b"hello"); // OK — we're connected
let conn = conn.disconnect(); // Connection<Disconnected>
// conn.send(b"hello"); // ERROR again — disconnected
Why transition methods consume self
Each transition takes self by value, not by reference. This destroys
the old state - you can't use a Connection<Disconnected>
after calling .connect() on it. The old variable is gone. This prevents
you from accidentally using a stale state.
Zero-sized types
struct Disconnected; has no fields and occupies zero bytes. The
PhantomData<S> in the connection also occupies zero bytes. The
state marker exists purely in the type system - it's erased completely at
compile time. Connection<Disconnected> and
Connection<Connected> have the same runtime representation
(just the addr field).
Tradeoffs
- Type proliferation. Each state is a type. Three states = three
impls. If states share methods, you need a separate
impl Connection<S>block for shared behaviour. - Can't store heterogeneous states.
Connection<Disconnected>andConnection<Connected>are different types. You can't put them in the sameVecorHashMapwithout erasing the state (which defeats the purpose). - Dynamic state machines don't fit. If the state is determined at runtime (user input, config file, external events), you need an enum - the typestate pattern requires statically known transitions.
When to use it
- Protocol state machines (TCP connection, HTTP request/response lifecycle)
- Builder patterns with required steps (#17)
- "Must do X before Y" constraints (open a file before reading, authenticate before querying)
- Any state machine where the valid operations depend on the current state and the transitions are known at compile time
When not to: dynamic state machines, state determined by runtime data, or when the number of states is large and the combinatorics of state-specific methods become unwieldy. Use an enum with runtime checks in those cases.
See it in practice: Building a Chat Server #4: Commands and Plugins uses this pattern for connection lifecycle safety.
Telex