Rust's Smart Pointer Patterns — Part 2: The Specialists
Part 2 of 2 in Rust's Smart Pointer Patterns. Companion series: Rust Patterns That Matter.
Previous: Part 1: The Ownership and Sharing System
Part 1 covered the
core system: Box for heap allocation, Rc and
Arc for sharing, Cell and RefCell
for interior mutability, Mutex and RwLock for
thread-safe mutation, and the combos like Arc<Mutex<T>>
that wire them together. Those types form a complete ownership toolkit.
If your problem is "who owns this, who can read it, and who can write
it," Part 1 has the answer.
This post covers the specialists. These are wrapper types that solve specific, narrower problems -- deferring allocations, initializing values lazily, preventing moves, and marking type-level relationships. You don't reach for them every day, but when you need them, nothing else will do.
1. Cow<'a, T> — Clone on Write
Cow holds either a borrowed reference or an owned value.
That's it. It's an enum:
enum Cow<'a, B: ToOwned> {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
The name stands for "clone on write," but the real insight is simpler:
Cow lets you defer cloning until you actually need to
mutate. Most of the time, you don't need to mutate. Most of the time,
the data passes through your function unchanged. Cow
makes the common case free.
The problem it solves
You're writing a function that normalizes user input. Maybe it trims whitespace, lowercases certain fields, replaces tabs with spaces. The catch: 90% of inputs are already valid. They don't need any modification at all.
If your function returns String, you're allocating on
every call -- even when the input passes through unchanged. If it
returns &str, you can't return a modified string
because it doesn't live long enough. You're stuck.
use std::borrow::Cow;
fn normalize_username<'a>(input: &'a str) -> Cow<'a, str> {
if input.contains(' ') || input.chars().any(|c| c.is_uppercase()) {
// Needs modification — allocate a new String
Cow::Owned(input.replace(' ', "_").to_lowercase())
} else {
// Already valid — just return the reference, zero allocation
Cow::Borrowed(input)
}
}
The happy path -- where the username is already clean -- costs nothing.
No heap allocation, no memcpy, no new String. The caller
gets back the original &str wrapped in a
Cow.
The unhappy path -- where the username needs work -- allocates a new
String. But that's fine. You were going to allocate
anyway if the data needed changing.
Using the result
Cow<str> implements Deref<Target = str>,
so it behaves like a &str in most contexts:
let name = normalize_username("alice");
println!("length: {}", name.len()); // works like &str
// If you need an owned String:
let owned: String = name.into_owned(); // clones only if Borrowed
Beyond strings
Cow works with any type that implements
ToOwned. The most common pairings:
Cow<str>—&strorStringCow<[u8]>—&[u8]orVec<u8>Cow<Path>—&PathorPathBufCow<[T]>—&[T]orVec<T>
When you reach for it
The decision is straightforward. If your function always
modifies the input, just return String. If it
never modifies the input, just return &str.
Cow is for the in-between: functions that usually
pass data through but sometimes need to modify it. Parsers,
config processors, validators, normalization layers. If you're
processing thousands of strings and the majority are already clean,
Cow saves you thousands of allocations.
For C programmers: it's like returning a const char* that
points to either the original input or a freshly malloc'd
buffer, except the type system tracks which one it is and handles the
free for you.
2. OnceCell<T> and OnceLock<T> — Initialize Exactly Once
Some values are expensive to create. You don't know at startup whether you'll need them. But once you create them, they never change. You want lazy initialization with a guarantee: this runs exactly once.
The problem without them
Without OnceCell or OnceLock, you end up
writing the manual version:
use std::sync::Mutex;
static CONFIG: Mutex<Option<AppConfig>> = Mutex::new(None);
fn get_config() -> AppConfig {
let mut guard = CONFIG.lock().unwrap();
if guard.is_none() {
*guard = Some(load_config_from_disk());
}
guard.clone().unwrap()
}
This works, but it's noisy. You're managing a Mutex
around an Option, manually checking whether the value
exists, and cloning on every access because you can't hold the lock
guard forever. And every subsequent caller still pays for the lock,
even though the value was set on the first call and never changes.
OnceLock: the thread-safe one
OnceLock is the one you'll use most. It's safe to share
across threads and its initialization is guaranteed to run exactly once,
even if multiple threads race to initialize it.
use std::sync::OnceLock;
static CONFIG: OnceLock<AppConfig> = OnceLock::new();
fn get_config() -> &'static AppConfig {
CONFIG.get_or_init(|| load_config_from_disk())
}
That's it. The first call runs the closure, stores the result, and
returns a reference to it. Every subsequent call returns the same
reference immediately -- no lock contention, no cloning, no
Option unwrapping.
get_or_init is the key method. It takes a closure that
produces the value. If the cell is empty, it runs the closure and
stores the result. If the cell is already full, it ignores the closure
entirely and returns what's there.
OnceCell: the single-threaded one
OnceCell is the same concept but without the
thread-safety machinery. It lives in std::cell instead
of std::sync:
use std::cell::OnceCell;
struct Parser {
regex_cache: OnceCell<Regex>,
}
impl Parser {
fn new() -> Self {
Parser { regex_cache: OnceCell::new() }
}
fn regex(&self) -> &Regex {
self.regex_cache.get_or_init(|| {
Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap()
})
}
}
Use OnceCell when the value lives inside a struct that
isn't shared across threads. Use OnceLock when you need
static globals or cross-thread sharing.
What they replaced
Before these types were in the standard library, the
lazy_static crate was the go-to. If you see
lazy_static! in older codebases, this is what it was
doing. OnceLock and OnceCell (stabilized in
Rust 1.70) replaced lazy_static for most uses, with the
advantage of being in std, requiring no macros, and giving
you explicit control over when initialization happens.
When you reach for them
Compiled regexes. Database connection pools. Lookup tables parsed from
files. Configuration loaded from environment variables. Any value
that's expensive to create, needed zero or one time, and immutable once
created. The pattern is always the same: declare the cell, call
get_or_init at the point of first use.
3. LazyLock<T> — OnceLock with a Built-in Initializer
OnceLock separates declaration from initialization. You
declare the cell in one place and provide the initializer at the point
of first use. LazyLock bundles them together: you provide
the initialization closure at declaration time.
use std::sync::LazyLock;
use regex::Regex;
static DATE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap()
});
static LOOKUP_TABLE: LazyLock<HashMap<String, i32>> = LazyLock::new(|| {
let mut m = HashMap::new();
m.insert("critical".into(), 1);
m.insert("warning".into(), 2);
m.insert("info".into(), 3);
m
});
fn is_valid_date(s: &str) -> bool {
DATE_REGEX.is_match(s) // initialized on first call
}
That LazyLock::new(|| ...) is essentially what the
lazy_static! macro used to generate. The closure runs on
first access, the result is stored, and subsequent accesses return the
stored value. No lock contention after initialization.
LazyLock vs OnceLock
The difference is about where the initializer lives:
- OnceLock: You provide the initialization closure at
the call site via
get_or_init. This means initialization can depend on runtime data -- a port number from a config file, a connection string from an environment variable, anything you can't express as astaticclosure. - LazyLock: The initialization closure is baked in at declaration time. Every access gets the same initialization logic. You can't change your mind later.
For static globals with fixed initialization -- which is
the most common case -- LazyLock is the cleaner choice.
The initialization is right there with the declaration, and you don't
have to worry about passing the right closure everywhere. For struct
fields or situations where the initialization depends on runtime data
that isn't available at declaration time, OnceLock is
the right tool.
LazyCell: the single-threaded equivalent
Just as OnceLock has a single-threaded counterpart in
OnceCell, LazyLock has
LazyCell (in std::cell). It works the
same way -- closure baked in at construction, value computed on first
access -- but without the thread-safety overhead. Use it for lazy
fields inside structs that aren't shared across threads.
Both LazyLock and LazyCell were stabilized
in Rust 1.80.
When you reach for it
LazyLock is for global/static values where you know the
initialization logic upfront. Compiled regexes, pre-built lookup
tables, parsed configuration, constant data loaded from embedded
files. If you're typing static FOO: ... = ... and the
right-hand side can't be a const, LazyLock
is probably what you want. For lazy fields inside single-threaded
structs, use LazyCell.
4. Pin<T> — Promise This Won't Move
This is the hardest type in the entire smart pointer toolkit. That's
not an apology -- it's a warning. Pin exists to solve a
problem that most languages pretend doesn't exist, and it earns its
complexity honestly. Let's take it step by step.
The problem: self-referential structs
By default, all values in Rust can be moved to a new memory location. When you push a value
into a Vec or return it from a function, the bytes get
copied to a new location. This is fine for most types -- an
i32 is an i32 no matter where it lives, and
a String is a pointer to the heap plus a length, so
moving the String struct doesn't affect the heap data it
points to.
But what if a struct contains a pointer to itself? If the struct moves, that pointer now points to where the struct used to be, not where it is now. The pointer dangles.
// Conceptual illustration — you can't actually write this in safe Rust
struct SelfRef {
data: String,
ptr_to_data: *const String, // points to self.data
}
// If SelfRef moves, ptr_to_data still points to the old location.
// Use-after-move. Bad.
In C terms: you have a struct that stores a pointer to one of its own
fields. If you memcpy the struct to a new address, that
pointer is now stale. Rust's Pin is like telling the
compiler "this value has the same address for its entire lifetime --
don't move it."
Why async makes this real
You might think self-referential structs are exotic. They're not.
When the compiler transforms an async fn into a state
machine, the resulting struct holds local variables across
.await points. If one of those locals is a reference to
another local, you've got a self-referential struct. Not every
async fn produces one -- only those that hold borrows
across suspension points -- but it's common enough that the language
needed a general solution:
async fn example() {
let data = vec![1, 2, 3];
let slice = &data[..]; // borrows data
some_async_op().await; // state machine suspends here
println!("{:?}", slice); // uses slice after resuming
}
The compiler-generated state machine struct holds both data
and slice. slice points into
data's memory. If this struct moved, slice
would dangle. Pin prevents the move.
What Pin actually is
Pin<P> wraps a pointer type P (like
&mut T or Box<T>) and restricts
what you can do with it. Specifically, you can't get a
&mut T from a Pin<&mut T>
unless T implements Unpin. Without
&mut T, you can't use mem::swap or
mem::replace to move the value out.
Most types implement Unpin automatically. An
i32, a String, a Vec<T>
-- they don't contain self-references, so moving them is harmless.
For these types, Pin imposes no restriction at all. It
only matters for !Unpin types -- and the main
!Unpin types you'll encounter are futures generated by
async blocks.
The practical surface
Most Rust programmers encounter Pin through async and
don't need to deeply understand the internals to use it. Here's
what you'll actually type:
Pin<Box<T>> — Heap-allocate the value
and pin it there. The Box pointer may move, but the data
on the heap stays put. This is the most common pinning strategy:
use std::pin::Pin;
use std::future::Future;
// Returning a future from a trait method using dynamic dispatch
fn fetch(&self) -> Pin<Box<dyn Future<Output = Vec<u8>> + Send + '_>> {
Box::pin(async move {
// your async implementation
vec![]
})
}
pin!() — Pin a value to the stack without heap
allocation. Available as std::pin::pin! since Rust
1.68. Unlike Box::pin, this avoids a heap allocation
entirely, which matters in hot async paths:
use std::pin::pin;
let fut = pin!(some_async_operation());
// fut is Pin<&mut impl Future> — pinned to the stack, no heap allocation
The Future::poll signature: If you ever implement
Future manually, the poll method takes
self: Pin<&mut Self>. This is the guarantee
the runtime relies on: by the time poll is called, the
future won't move.
The honest assessment
Pin is genuinely complex. The interaction between
Pin, Unpin, and structural pinning is one
of the most subtle corners of the language. But here's the good news:
for application code, you almost never need to think about it beyond
"Box::pin(async move { ... }) when the compiler asks for
it." Library authors who implement custom futures or self-referential
data structures need the deeper understanding. Application authors
mostly need to know it exists and reach for Box::pin.
When you reach for it
You don't "reach for" Pin the way you reach for
Cow or OnceLock. The compiler tells you
when you need it. That happens in three situations: returning futures
as trait objects, storing futures in struct fields, and implementing
the Future trait by hand. In all three cases,
Box::pin() is the answer that gets you moving.
5. PhantomData<T> — Zero-Size Type-Level Marker
PhantomData doesn't hold any data. It has zero size and
compiles to nothing. Its entire purpose is telling the compiler about
a type relationship that the struct's fields don't express on their
own.
The problem it solves
Sometimes a struct has a type or lifetime parameter that none of
its fields directly express. The simplest case: the compiler says
"parameter T is never used."
// This doesn't compile: T is unused
struct Id<T> {
value: u64,
// T doesn't appear in any field
}
The more subtle case involves raw pointers. Raw pointers use
T syntactically, but the compiler can't infer
ownership, lifetime, or variance from them:
// This compiles, but 'a is unused and variance is wrong
struct Slice<'a, T> {
start: *const T,
end: *const T,
// 'a is declared but never used — won't compile.
// And the compiler can't tell whether this struct
// "owns" or "borrows" T through these raw pointers.
}
Raw pointers don't carry ownership or lifetime information. The
compiler doesn't know whether this struct logically owns the
T values or just points to them. This matters for
drop checking, variance (whether Slice<&'long T>
can be used where Slice<&'short T> is
expected), and auto-trait inference.
PhantomData fills the gap:
use std::marker::PhantomData;
struct Slice<'a, T> {
start: *const T,
end: *const T,
_marker: PhantomData<&'a T>, // "pretend" we borrow a T with lifetime 'a
}
The PhantomData<&'a T> tells the compiler: "treat
this struct as if it borrows a &'a T." The compiler
can now correctly reason about lifetimes and variance. At runtime, the
field takes zero bytes.
What you put inside the PhantomData matters. Two common
forms serve different purposes:
PhantomData<&'a T>— "this struct borrows aT." Used for iterators and views over borrowed data. Tells the compiler about lifetime and variance.PhantomData<T>— "this struct owns aT." Used for owning raw-pointer types like a customVec. Tells the compiler that dropping this struct might drop aT, which is critical for drop check correctness.
Getting this wrong doesn't just produce incorrect variance -- it can
cause unsound drop behavior. If you're building an owning abstraction
over raw pointers, you need PhantomData<T>, not
PhantomData<&'a T>.
Common uses
Associating a lifetime with a raw pointer (borrows T):
struct RawIter<'a, T> {
ptr: *const T,
end: *const T,
_marker: PhantomData<&'a T>, // borrows T — drop check knows T outlives this
}
Owning raw-pointer abstraction (owns T):
struct MyVec<T> {
ptr: *mut T,
len: usize,
cap: usize,
_marker: PhantomData<T>, // owns T — drop check knows we might drop T
}
Marking a type parameter as "used" for type-level logic:
// A typed ID that prevents mixing user IDs with order IDs
struct Id<T> {
value: u64,
_marker: PhantomData<T>,
}
struct User;
struct Order;
let user_id: Id<User> = Id { value: 42, _marker: PhantomData };
let order_id: Id<Order> = Id { value: 42, _marker: PhantomData };
// user_id and order_id are different types despite both holding u64
That last example is really a newtype pattern with a generic twist.
The PhantomData makes the type parameter T
"used" so the compiler accepts it, even though no actual
T value is stored. The result: Id<User>
and Id<Order> are distinct types with zero runtime
overhead.
When you reach for it
PhantomData is more advanced than the other types in this
post. You'll encounter it when writing unsafe abstractions over raw
pointers, implementing custom iterators over borrowed data, or building
type-level state machines where a type parameter carries meaning but no
data. If the compiler says "unused type parameter," and you know the
parameter is conceptually part of the type, PhantomData
is the answer.
When to Reach for What: The Full Decision Tree
Here's the combined map across both parts. Start with what you need, and follow the path.
I need to put data on the heap. Use
Box<T>. Single owner, fixed size on the stack,
data on the heap. This is your malloc.
I need multiple owners (single-threaded). Use
Rc<T>. Reference-counted, clones bump the count,
drops decrement it, last drop frees the data.
I need multiple owners (multi-threaded). Use
Arc<T>. Same as Rc but with atomic
reference counting, safe to send across threads.
I need to mutate through a shared reference
(single-threaded). Use RefCell<T> for
full borrow checking at runtime, or Cell<T> for
small values where you only need get/set/replace (note:
get requires T: Copy; for non-Copy types
use set, take, or replace).
I need to mutate through a shared reference
(multi-threaded). Use Mutex<T>. Or
RwLock<T> if reads heavily outnumber writes.
I need shared + mutable (single-threaded). Combine:
Rc<RefCell<T>>.
I need shared + mutable (multi-threaded). Combine:
Arc<Mutex<T>> or
Arc<RwLock<T>>.
I have a function that usually passes data through but
sometimes modifies it. Use Cow<'a, T>.
Borrow on the happy path, allocate only when you must.
I have an expensive value that needs lazy, one-time
initialization. Use OnceLock<T> for
thread-safe or OnceCell<T> for single-threaded.
I have a static value with a known initializer. Use
LazyLock<T>. It's OnceLock with the
initialization baked in.
The compiler says my future needs to be pinned. Use
Box::pin(async move { ... }) or the pin!()
macro. This comes up with async trait objects and stored futures.
I have a generic struct with an unused type parameter, or
raw pointers where the compiler can't infer lifetime/variance.
Add a PhantomData field to express the relationship:
PhantomData<T> for ownership,
PhantomData<&'a T> for borrowing.
Putting It All Together
Between Part 1 and this post, you've seen the full set of wrapper types that Rust provides for managing ownership, sharing, mutation, lazy initialization, conditional allocation, and memory pinning. That's a lot of types. But the decision tree above should make it manageable -- you're never choosing from all of them at once. You're answering a specific question about your code, and the answer narrows to one or two types.
The core system from Part 1 handles the structural question: who owns this, who can see it, and who can change it. The specialists from this post handle the tactical questions: can I avoid allocating, can I defer work, can I promise something about memory layout.
If you've also read
40 Rust Patterns That
Matter and the companion
Rust Patterns That Matter
series, you now have a solid intermediate toolkit: the patterns
for working with Option, Result, iterators,
and collections, plus the wrapper types for managing ownership and
memory at every scale. These complement each other -- the patterns
tell you what to do with values, and the smart pointer
series tells you how to hold them.
Telex