Telex logo Telex

Rust's Smart Pointer Patterns — Part 2: The Specialists

Part 2 of 2 in Rust's Smart Pointer Patterns. Companion series: Rust Patterns That Matter.

Previous: Part 1: The Ownership and Sharing System

Part 1 covered the core system: Box for heap allocation, Rc and Arc for sharing, Cell and RefCell for interior mutability, Mutex and RwLock for thread-safe mutation, and the combos like Arc<Mutex<T>> that wire them together. Those types form a complete ownership toolkit. If your problem is "who owns this, who can read it, and who can write it," Part 1 has the answer.

This post covers the specialists. These are wrapper types that solve specific, narrower problems -- deferring allocations, initializing values lazily, preventing moves, and marking type-level relationships. You don't reach for them every day, but when you need them, nothing else will do.

1. Cow<'a, T> — Clone on Write

Cow holds either a borrowed reference or an owned value. That's it. It's an enum:

enum Cow<'a, B: ToOwned> {
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned),
}

The name stands for "clone on write," but the real insight is simpler: Cow lets you defer cloning until you actually need to mutate. Most of the time, you don't need to mutate. Most of the time, the data passes through your function unchanged. Cow makes the common case free.

The problem it solves

You're writing a function that normalizes user input. Maybe it trims whitespace, lowercases certain fields, replaces tabs with spaces. The catch: 90% of inputs are already valid. They don't need any modification at all.

If your function returns String, you're allocating on every call -- even when the input passes through unchanged. If it returns &str, you can't return a modified string because it doesn't live long enough. You're stuck.

use std::borrow::Cow;

fn normalize_username<'a>(input: &'a str) -> Cow<'a, str> {
    if input.contains(' ') || input.chars().any(|c| c.is_uppercase()) {
        // Needs modification — allocate a new String
        Cow::Owned(input.replace(' ', "_").to_lowercase())
    } else {
        // Already valid — just return the reference, zero allocation
        Cow::Borrowed(input)
    }
}

The happy path -- where the username is already clean -- costs nothing. No heap allocation, no memcpy, no new String. The caller gets back the original &str wrapped in a Cow.

The unhappy path -- where the username needs work -- allocates a new String. But that's fine. You were going to allocate anyway if the data needed changing.

Using the result

Cow<str> implements Deref<Target = str>, so it behaves like a &str in most contexts:

let name = normalize_username("alice");
println!("length: {}", name.len());       // works like &str

// If you need an owned String:
let owned: String = name.into_owned();   // clones only if Borrowed

Beyond strings

Cow works with any type that implements ToOwned. The most common pairings:

When you reach for it

The decision is straightforward. If your function always modifies the input, just return String. If it never modifies the input, just return &str. Cow is for the in-between: functions that usually pass data through but sometimes need to modify it. Parsers, config processors, validators, normalization layers. If you're processing thousands of strings and the majority are already clean, Cow saves you thousands of allocations.

For C programmers: it's like returning a const char* that points to either the original input or a freshly malloc'd buffer, except the type system tracks which one it is and handles the free for you.

2. OnceCell<T> and OnceLock<T> — Initialize Exactly Once

Some values are expensive to create. You don't know at startup whether you'll need them. But once you create them, they never change. You want lazy initialization with a guarantee: this runs exactly once.

The problem without them

Without OnceCell or OnceLock, you end up writing the manual version:

use std::sync::Mutex;

static CONFIG: Mutex<Option<AppConfig>> = Mutex::new(None);

fn get_config() -> AppConfig {
    let mut guard = CONFIG.lock().unwrap();
    if guard.is_none() {
        *guard = Some(load_config_from_disk());
    }
    guard.clone().unwrap()
}

This works, but it's noisy. You're managing a Mutex around an Option, manually checking whether the value exists, and cloning on every access because you can't hold the lock guard forever. And every subsequent caller still pays for the lock, even though the value was set on the first call and never changes.

OnceLock: the thread-safe one

OnceLock is the one you'll use most. It's safe to share across threads and its initialization is guaranteed to run exactly once, even if multiple threads race to initialize it.

use std::sync::OnceLock;

static CONFIG: OnceLock<AppConfig> = OnceLock::new();

fn get_config() -> &'static AppConfig {
    CONFIG.get_or_init(|| load_config_from_disk())
}

That's it. The first call runs the closure, stores the result, and returns a reference to it. Every subsequent call returns the same reference immediately -- no lock contention, no cloning, no Option unwrapping.

get_or_init is the key method. It takes a closure that produces the value. If the cell is empty, it runs the closure and stores the result. If the cell is already full, it ignores the closure entirely and returns what's there.

OnceCell: the single-threaded one

OnceCell is the same concept but without the thread-safety machinery. It lives in std::cell instead of std::sync:

use std::cell::OnceCell;

struct Parser {
    regex_cache: OnceCell<Regex>,
}

impl Parser {
    fn new() -> Self {
        Parser { regex_cache: OnceCell::new() }
    }

    fn regex(&self) -> &Regex {
        self.regex_cache.get_or_init(|| {
            Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap()
        })
    }
}

Use OnceCell when the value lives inside a struct that isn't shared across threads. Use OnceLock when you need static globals or cross-thread sharing.

What they replaced

Before these types were in the standard library, the lazy_static crate was the go-to. If you see lazy_static! in older codebases, this is what it was doing. OnceLock and OnceCell (stabilized in Rust 1.70) replaced lazy_static for most uses, with the advantage of being in std, requiring no macros, and giving you explicit control over when initialization happens.

When you reach for them

Compiled regexes. Database connection pools. Lookup tables parsed from files. Configuration loaded from environment variables. Any value that's expensive to create, needed zero or one time, and immutable once created. The pattern is always the same: declare the cell, call get_or_init at the point of first use.

3. LazyLock<T> — OnceLock with a Built-in Initializer

OnceLock separates declaration from initialization. You declare the cell in one place and provide the initializer at the point of first use. LazyLock bundles them together: you provide the initialization closure at declaration time.

use std::sync::LazyLock;
use regex::Regex;

static DATE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap()
});

static LOOKUP_TABLE: LazyLock<HashMap<String, i32>> = LazyLock::new(|| {
    let mut m = HashMap::new();
    m.insert("critical".into(), 1);
    m.insert("warning".into(), 2);
    m.insert("info".into(), 3);
    m
});

fn is_valid_date(s: &str) -> bool {
    DATE_REGEX.is_match(s)  // initialized on first call
}

That LazyLock::new(|| ...) is essentially what the lazy_static! macro used to generate. The closure runs on first access, the result is stored, and subsequent accesses return the stored value. No lock contention after initialization.

LazyLock vs OnceLock

The difference is about where the initializer lives:

For static globals with fixed initialization -- which is the most common case -- LazyLock is the cleaner choice. The initialization is right there with the declaration, and you don't have to worry about passing the right closure everywhere. For struct fields or situations where the initialization depends on runtime data that isn't available at declaration time, OnceLock is the right tool.

LazyCell: the single-threaded equivalent

Just as OnceLock has a single-threaded counterpart in OnceCell, LazyLock has LazyCell (in std::cell). It works the same way -- closure baked in at construction, value computed on first access -- but without the thread-safety overhead. Use it for lazy fields inside structs that aren't shared across threads. Both LazyLock and LazyCell were stabilized in Rust 1.80.

When you reach for it

LazyLock is for global/static values where you know the initialization logic upfront. Compiled regexes, pre-built lookup tables, parsed configuration, constant data loaded from embedded files. If you're typing static FOO: ... = ... and the right-hand side can't be a const, LazyLock is probably what you want. For lazy fields inside single-threaded structs, use LazyCell.

4. Pin<T> — Promise This Won't Move

This is the hardest type in the entire smart pointer toolkit. That's not an apology -- it's a warning. Pin exists to solve a problem that most languages pretend doesn't exist, and it earns its complexity honestly. Let's take it step by step.

The problem: self-referential structs

By default, all values in Rust can be moved to a new memory location. When you push a value into a Vec or return it from a function, the bytes get copied to a new location. This is fine for most types -- an i32 is an i32 no matter where it lives, and a String is a pointer to the heap plus a length, so moving the String struct doesn't affect the heap data it points to.

But what if a struct contains a pointer to itself? If the struct moves, that pointer now points to where the struct used to be, not where it is now. The pointer dangles.

// Conceptual illustration — you can't actually write this in safe Rust
struct SelfRef {
    data: String,
    ptr_to_data: *const String,  // points to self.data
}
// If SelfRef moves, ptr_to_data still points to the old location.
// Use-after-move. Bad.

In C terms: you have a struct that stores a pointer to one of its own fields. If you memcpy the struct to a new address, that pointer is now stale. Rust's Pin is like telling the compiler "this value has the same address for its entire lifetime -- don't move it."

Why async makes this real

You might think self-referential structs are exotic. They're not. When the compiler transforms an async fn into a state machine, the resulting struct holds local variables across .await points. If one of those locals is a reference to another local, you've got a self-referential struct. Not every async fn produces one -- only those that hold borrows across suspension points -- but it's common enough that the language needed a general solution:

async fn example() {
    let data = vec![1, 2, 3];
    let slice = &data[..];       // borrows data
    some_async_op().await;       // state machine suspends here
    println!("{:?}", slice);     // uses slice after resuming
}

The compiler-generated state machine struct holds both data and slice. slice points into data's memory. If this struct moved, slice would dangle. Pin prevents the move.

What Pin actually is

Pin<P> wraps a pointer type P (like &mut T or Box<T>) and restricts what you can do with it. Specifically, you can't get a &mut T from a Pin<&mut T> unless T implements Unpin. Without &mut T, you can't use mem::swap or mem::replace to move the value out.

Most types implement Unpin automatically. An i32, a String, a Vec<T> -- they don't contain self-references, so moving them is harmless. For these types, Pin imposes no restriction at all. It only matters for !Unpin types -- and the main !Unpin types you'll encounter are futures generated by async blocks.

The practical surface

Most Rust programmers encounter Pin through async and don't need to deeply understand the internals to use it. Here's what you'll actually type:

Pin<Box<T>> — Heap-allocate the value and pin it there. The Box pointer may move, but the data on the heap stays put. This is the most common pinning strategy:

use std::pin::Pin;
use std::future::Future;

// Returning a future from a trait method using dynamic dispatch
fn fetch(&self) -> Pin<Box<dyn Future<Output = Vec<u8>> + Send + '_>> {
    Box::pin(async move {
        // your async implementation
        vec![]
    })
}

pin!() — Pin a value to the stack without heap allocation. Available as std::pin::pin! since Rust 1.68. Unlike Box::pin, this avoids a heap allocation entirely, which matters in hot async paths:

use std::pin::pin;

let fut = pin!(some_async_operation());
// fut is Pin<&mut impl Future> — pinned to the stack, no heap allocation

The Future::poll signature: If you ever implement Future manually, the poll method takes self: Pin<&mut Self>. This is the guarantee the runtime relies on: by the time poll is called, the future won't move.

The honest assessment

Pin is genuinely complex. The interaction between Pin, Unpin, and structural pinning is one of the most subtle corners of the language. But here's the good news: for application code, you almost never need to think about it beyond "Box::pin(async move { ... }) when the compiler asks for it." Library authors who implement custom futures or self-referential data structures need the deeper understanding. Application authors mostly need to know it exists and reach for Box::pin.

When you reach for it

You don't "reach for" Pin the way you reach for Cow or OnceLock. The compiler tells you when you need it. That happens in three situations: returning futures as trait objects, storing futures in struct fields, and implementing the Future trait by hand. In all three cases, Box::pin() is the answer that gets you moving.

5. PhantomData<T> — Zero-Size Type-Level Marker

PhantomData doesn't hold any data. It has zero size and compiles to nothing. Its entire purpose is telling the compiler about a type relationship that the struct's fields don't express on their own.

The problem it solves

Sometimes a struct has a type or lifetime parameter that none of its fields directly express. The simplest case: the compiler says "parameter T is never used."

// This doesn't compile: T is unused
struct Id<T> {
    value: u64,
    // T doesn't appear in any field
}

The more subtle case involves raw pointers. Raw pointers use T syntactically, but the compiler can't infer ownership, lifetime, or variance from them:

// This compiles, but 'a is unused and variance is wrong
struct Slice<'a, T> {
    start: *const T,
    end: *const T,
    // 'a is declared but never used — won't compile.
    // And the compiler can't tell whether this struct
    // "owns" or "borrows" T through these raw pointers.
}

Raw pointers don't carry ownership or lifetime information. The compiler doesn't know whether this struct logically owns the T values or just points to them. This matters for drop checking, variance (whether Slice<&'long T> can be used where Slice<&'short T> is expected), and auto-trait inference.

PhantomData fills the gap:

use std::marker::PhantomData;

struct Slice<'a, T> {
    start: *const T,
    end: *const T,
    _marker: PhantomData<&'a T>,  // "pretend" we borrow a T with lifetime 'a
}

The PhantomData<&'a T> tells the compiler: "treat this struct as if it borrows a &'a T." The compiler can now correctly reason about lifetimes and variance. At runtime, the field takes zero bytes.

What you put inside the PhantomData matters. Two common forms serve different purposes:

Getting this wrong doesn't just produce incorrect variance -- it can cause unsound drop behavior. If you're building an owning abstraction over raw pointers, you need PhantomData<T>, not PhantomData<&'a T>.

Common uses

Associating a lifetime with a raw pointer (borrows T):

struct RawIter<'a, T> {
    ptr: *const T,
    end: *const T,
    _marker: PhantomData<&'a T>,  // borrows T — drop check knows T outlives this
}

Owning raw-pointer abstraction (owns T):

struct MyVec<T> {
    ptr: *mut T,
    len: usize,
    cap: usize,
    _marker: PhantomData<T>,  // owns T — drop check knows we might drop T
}

Marking a type parameter as "used" for type-level logic:

// A typed ID that prevents mixing user IDs with order IDs
struct Id<T> {
    value: u64,
    _marker: PhantomData<T>,
}

struct User;
struct Order;

let user_id: Id<User> = Id { value: 42, _marker: PhantomData };
let order_id: Id<Order> = Id { value: 42, _marker: PhantomData };
// user_id and order_id are different types despite both holding u64

That last example is really a newtype pattern with a generic twist. The PhantomData makes the type parameter T "used" so the compiler accepts it, even though no actual T value is stored. The result: Id<User> and Id<Order> are distinct types with zero runtime overhead.

When you reach for it

PhantomData is more advanced than the other types in this post. You'll encounter it when writing unsafe abstractions over raw pointers, implementing custom iterators over borrowed data, or building type-level state machines where a type parameter carries meaning but no data. If the compiler says "unused type parameter," and you know the parameter is conceptually part of the type, PhantomData is the answer.

When to Reach for What: The Full Decision Tree

Here's the combined map across both parts. Start with what you need, and follow the path.

I need to put data on the heap. Use Box<T>. Single owner, fixed size on the stack, data on the heap. This is your malloc.

I need multiple owners (single-threaded). Use Rc<T>. Reference-counted, clones bump the count, drops decrement it, last drop frees the data.

I need multiple owners (multi-threaded). Use Arc<T>. Same as Rc but with atomic reference counting, safe to send across threads.

I need to mutate through a shared reference (single-threaded). Use RefCell<T> for full borrow checking at runtime, or Cell<T> for small values where you only need get/set/replace (note: get requires T: Copy; for non-Copy types use set, take, or replace).

I need to mutate through a shared reference (multi-threaded). Use Mutex<T>. Or RwLock<T> if reads heavily outnumber writes.

I need shared + mutable (single-threaded). Combine: Rc<RefCell<T>>.

I need shared + mutable (multi-threaded). Combine: Arc<Mutex<T>> or Arc<RwLock<T>>.

I have a function that usually passes data through but sometimes modifies it. Use Cow<'a, T>. Borrow on the happy path, allocate only when you must.

I have an expensive value that needs lazy, one-time initialization. Use OnceLock<T> for thread-safe or OnceCell<T> for single-threaded.

I have a static value with a known initializer. Use LazyLock<T>. It's OnceLock with the initialization baked in.

The compiler says my future needs to be pinned. Use Box::pin(async move { ... }) or the pin!() macro. This comes up with async trait objects and stored futures.

I have a generic struct with an unused type parameter, or raw pointers where the compiler can't infer lifetime/variance. Add a PhantomData field to express the relationship: PhantomData<T> for ownership, PhantomData<&'a T> for borrowing.

Putting It All Together

Between Part 1 and this post, you've seen the full set of wrapper types that Rust provides for managing ownership, sharing, mutation, lazy initialization, conditional allocation, and memory pinning. That's a lot of types. But the decision tree above should make it manageable -- you're never choosing from all of them at once. You're answering a specific question about your code, and the answer narrows to one or two types.

The core system from Part 1 handles the structural question: who owns this, who can see it, and who can change it. The specialists from this post handle the tactical questions: can I avoid allocating, can I defer work, can I promise something about memory layout.

If you've also read 40 Rust Patterns That Matter and the companion Rust Patterns That Matter series, you now have a solid intermediate toolkit: the patterns for working with Option, Result, iterators, and collections, plus the wrapper types for managing ownership and memory at every scale. These complement each other -- the patterns tell you what to do with values, and the smart pointer series tells you how to hold them.