Telex logo Telex

Rust Deep Dives #4: Why Rust Has Six String Types (And When You'll Hit Each One)

Post 4 of 8 in Rust Deep Dives. Companion series: Rust Patterns That Matter.

Previous: #3: AsRef, Borrow, and ToOwned | Next: #5: Trait Patterns

Most languages give you one string type and call it a day. Python has str. Go has string. JavaScript has String. Rust gives you &str, String, OsStr, OsString, CStr, CString, Path, and PathBuf. That's eight types across four pairs. The first time you see them all, it feels like overengineering. It isn't. Each pair exists because the world has different ideas about what a "string" is, and Rust chose honesty over pretending.

Here's when each one shows up and why.

The core pair: &str and String

These are the two you use 90% of the time. &str is a borrowed, immutable slice of UTF-8 bytes. Under the hood, it's just a pointer and a length -- the same two words a C programmer would carry around. It doesn't own the data. It can point into a String, into a string literal baked into your binary, or into any contiguous UTF-8 buffer.

String is the owned, heap-allocated, growable version. It's backed by a Vec<u8> that's guaranteed to contain valid UTF-8. You use it when you need to build strings at runtime, store them in structs, or pass ownership across function boundaries.

// &str: borrowed, immutable, no allocation
let greeting: &str = "hello";  // points into the binary's read-only data

// String: owned, growable, heap-allocated
let mut name = String::from("world");
name.push_str("!");

// String derefs to &str, so you can pass it anywhere &str is expected
fn greet(who: &str) {
    println!("hello, {who}");
}

greet(&name);      // &String coerces to &str
greet(greeting);   // &str passed directly

The relationship between them is simple: String implements Deref<Target = str>, so anywhere you accept &str, you can pass a &String and it just works. This is why the standard advice is to take &str in function signatures -- it accepts both types without forcing the caller to allocate.

If you're coming from C, think of &str as a fat pointer to a char buffer (with a length instead of a null terminator), and String as a malloc'd buffer that knows its own length and capacity. The key difference from C: both are guaranteed to be valid UTF-8. There is no "maybe it's Latin-1, maybe it's UTF-8" ambiguity.

Why one pair isn't enough

If every string in the world were UTF-8, we'd be done. But they aren't.

File paths on Windows can contain arbitrary UTF-16 code units, including unpaired surrogates that have no valid UTF-8 representation. Environment variables on Unix are arbitrary byte sequences that don't have to be UTF-8 at all. C libraries use null-terminated byte arrays that may or may not be valid text in any encoding.

Most languages handle this by pretending the problem doesn't exist. Python 3 throws a UnicodeDecodeError when it encounters a filename it can't decode. Go's string is defined as a byte sequence that's "conventionally but not necessarily" UTF-8, which means bugs hide until runtime. Rust chose a different approach: separate types with separate guarantees, so the compiler catches encoding mismatches at compile time instead of 3 AM.

The OS boundary: OsStr and OsString

OsStr and OsString are Rust's representation of platform-native strings. On Unix, they're arbitrary byte sequences (the OS doesn't care about encoding). On Windows, they're potentially ill-formed UTF-16 -- meaning they can contain unpaired surrogates that no valid Unicode string could represent.

You hit these every time you interact with the operating system: reading environment variables, listing directory entries, handling command-line arguments. The API functions that return these types are the honest versions. There are convenience wrappers that return String, but they panic or return errors when the OS hands back something that isn't valid UTF-8.

use std::env;
use std::ffi::OsString;

// env::var() returns Result<String> -- panics-friendly but lossy
// env::var_os() returns Option<OsString> -- always works
fn get_home() -> Option<OsString> {
    env::var_os("HOME")
}

The same borrowed/owned pattern applies: OsStr is the borrowed slice, OsString is the owned, heap-allocated version. OsString derefs to OsStr, just like String derefs to str.

Converting from OsStr to &str is fallible, because the OS string might not be valid UTF-8. This is the whole point -- the type system forces you to handle the possibility:

use std::ffi::OsStr;

fn print_env(key: &str) {
    match std::env::var_os(key) {
        Some(ref val) => {
            // .to_str() returns Option<&str> -- None if not valid UTF-8
            match val.to_str() {
                Some(s) => println!("{key}={s}"),
                None    => println!("{key}=<not valid UTF-8>"),
            }
        }
        None => println!("{key} is not set"),
    }
}

If you know the string is valid UTF-8 (which it almost always is in practice), you can use .to_str().unwrap() or .to_string_lossy() which replaces invalid sequences with the Unicode replacement character. But the type system makes this an explicit choice rather than a silent assumption.

The C boundary: CStr and CString

C strings are a different animal. They're null-terminated byte arrays with no length field and no interior null bytes. They're not necessarily UTF-8 -- they're whatever encoding the C library decided to use (often but not always UTF-8 on modern systems).

CStr is a borrowed reference to a null-terminated C string. CString is an owned, heap-allocated null-terminated string. You use them when calling C functions through FFI.

use std::ffi::{CStr, CString};
use std::os::raw::c_char;

extern "C" {
    fn setenv(name: *const c_char, value: *const c_char, overwrite: i32) -> i32;
    fn getenv(name: *const c_char) -> *const c_char;
}

fn safe_setenv(name: &str, value: &str) -> Result<(), std::ffi::NulError> {
    // CString::new checks for interior null bytes.
    // If your Rust string contains a \0, this returns Err.
    let c_name = CString::new(name)?;
    let c_value = CString::new(value)?;

    unsafe {
        setenv(c_name.as_ptr(), c_value.as_ptr(), 1);
    }
    Ok(())
}

The CString::new constructor is worth understanding. It takes any byte sequence, appends a null terminator, and checks that the input contains no interior null bytes. Interior nulls are fatal in C -- strlen would stop at the first one, silently truncating your string. Rust catches this at construction time.

// This works: no interior nulls
let ok = CString::new("hello");
assert!(ok.is_ok());

// This fails: interior null byte at position 5
let bad = CString::new("hello\0world");
assert!(bad.is_err());

Going the other direction -- reading a C string back into Rust -- you use CStr:

fn safe_getenv(name: &str) -> Option<String> {
    let c_name = CString::new(name).ok()?;

    unsafe {
        let ptr = getenv(c_name.as_ptr());
        if ptr.is_null() {
            return None;
        }
        // CStr::from_ptr reads until the null terminator
        let c_str = CStr::from_ptr(ptr);
        // .to_str() checks for valid UTF-8
        c_str.to_str().ok().map(String::from)
    }
}

If you're coming from C, this is the type pair that will feel most familiar. CStr is a safe wrapper around const char*, and CString is a safe wrapper around a malloc'd char* with automatic cleanup. The difference from raw C: Rust guarantees no interior nulls, tracks ownership, and makes encoding conversion explicit.

The filesystem: Path and PathBuf

Path and PathBuf are OsStr and OsString with path-specific methods bolted on. Internally, a Path is just a newtype wrapper around OsStr, and PathBuf wraps OsString. The reason they exist as separate types is ergonomics: paths have operations that plain OS strings don't -- joining components, extracting extensions, navigating parent directories.

use std::path::{Path, PathBuf};

let config = Path::new("/home/user/.config/app/settings.toml");

// Navigate the path
let dir = config.parent();             // Some("/home/user/.config/app")
let file = config.file_name();         // Some("settings.toml")
let stem = config.file_stem();         // Some("settings")
let ext = config.extension();          // Some("toml")

// Check properties
let is_abs = config.is_absolute();     // true
let exists = config.exists();          // checks the filesystem
let starts = config.starts_with("/home"); // true (component-aware)

The .join() method is where PathBuf really earns its keep. It handles path separators correctly on every platform, which is something you do not want to do by hand with string concatenation:

use std::path::PathBuf;

// Building paths: .join() handles separators for you
let mut log_dir = PathBuf::from("/var/log");
log_dir.push("myapp");
log_dir.push("debug.log");
// log_dir is now "/var/log/myapp/debug.log"

// Or more concisely:
let log_path = Path::new("/var/log").join("myapp").join("debug.log");

// Changing components
let mut p = PathBuf::from("/tmp/report.csv");
p.set_extension("json");
// p is now "/tmp/report.json"

Why not just use String for paths? Three reasons. First, file paths aren't always valid UTF-8 (the same reason OsStr exists). Second, path operations are platform-specific -- / vs \, drive letters, UNC paths on Windows -- and Path handles all of it correctly. Third, starts_with on a Path is component-aware: it checks whole path components, not just string prefixes.

use std::path::Path;

// String prefix matching gets this wrong:
let s = "/home/user2/docs";
assert!(s.starts_with("/home/user")); // true! But user != user2

// Path component matching gets it right:
let p = Path::new("/home/user2/docs");
assert!(!p.starts_with("/home/user")); // false -- correctly

That distinction alone is a security-relevant reason to use Path instead of string manipulation for filesystem operations.

The conversion map

Conversions between these types follow a consistent pattern. Going from more-specific to less-specific is cheap or free. Going from less-specific to more-specific is fallible, because you're adding a guarantee (like "this is valid UTF-8") that might not hold.

String / &str to other types

use std::ffi::{OsStr, OsString, CString};
use std::path::{Path, PathBuf};

let s: &str = "hello.txt";
let owned: String = String::from("hello.txt");

// To OsStr/OsString -- always succeeds (UTF-8 is valid OS string)
let os: &OsStr = OsStr::new(s);
let os_owned: OsString = OsString::from(owned.clone());

// To Path/PathBuf -- always succeeds (any string is a valid path)
let path: &Path = Path::new(s);
let path_owned: PathBuf = PathBuf::from(owned.clone());

// To CString -- fallible (interior nulls are not allowed)
let c: CString = CString::new(s).expect("no interior nulls");

OsStr / OsString to other types

use std::ffi::OsStr;
use std::path::Path;

let os: &OsStr = OsStr::new("data.bin");

// To &str -- fallible (might not be valid UTF-8)
let s: Option<&str> = os.to_str();

// To String -- lossy option (replaces invalid bytes with U+FFFD)
let lossy: String = os.to_string_lossy().into_owned();

// To &Path -- always succeeds (OsStr is the inner type of Path)
let path: &Path = Path::new(os);

CStr to other types

use std::ffi::CStr;

let c: &CStr = unsafe { CStr::from_ptr(some_c_pointer) };

// To &str -- fallible (C string might not be valid UTF-8)
let s: Result<&str, _> = c.to_str();

// To String -- lossy option
let lossy: String = c.to_string_lossy().into_owned();

// To &[u8] -- always succeeds (raw bytes without the null terminator)
let bytes: &[u8] = c.to_bytes();

Path / PathBuf to other types

use std::path::Path;
use std::ffi::OsStr;

let path = Path::new("/tmp/data.csv");

// To &OsStr -- always succeeds (Path is a wrapper around OsStr)
let os: &OsStr = path.as_os_str();

// To &str -- fallible (path might not be valid UTF-8)
let s: Option<&str> = path.to_str();

// To String -- via to_str or lossy
let lossy: String = path.to_string_lossy().into_owned();

// display() for user-facing output (also lossy)
println!("reading from: {}", path.display());

The pattern is always the same. If both types guarantee UTF-8, conversion is free. If you're going from "maybe UTF-8" to "definitely UTF-8," you get a Result or Option. The types encode what the compiler can and can't prove about your data.

The practical rule

You don't need to memorize all of this. Here's the decision tree that covers real-world code:

Most of the time, you use &str and String. They cover user-facing text, JSON, configuration, log messages, HTTP bodies -- anything that's text in the UTF-8 sense. Accept &str in function parameters. Store String in structs.

When you touch the filesystem, you switch to Path and PathBuf. Don't build file paths with format! or string concatenation. Use Path::new() and .join(). Your code becomes portable and correct on platforms where paths aren't UTF-8.

// Don't do this:
let path = format!("{}/{}/{}", base_dir, subdir, filename);

// Do this:
let path = Path::new(&base_dir).join(&subdir).join(&filename);

When you call C functions through FFI, you use CStr and CString. They handle the null terminator and the interior-null check so you don't have to think about it in unsafe blocks. You'll encounter these when wrapping C libraries or interacting with system APIs that don't have Rust-native bindings yet.

When you need raw OS strings -- the ones that might not be UTF-8 and aren't file paths -- you use OsStr and OsString. In practice, this is mostly environment variables (via var_os) and the raw forms of path components. You rarely create these yourself; they come to you from the standard library's OS interfaces.

use std::path::PathBuf;
use std::ffi::CString;

// Text: &str and String
fn process_message(msg: &str) -> String {
    format!("[INFO] {msg}")
}

// Files: Path and PathBuf
fn config_path() -> PathBuf {
    let home = std::env::var("HOME").unwrap_or_else(|_| "/tmp".into());
    PathBuf::from(home).join(".config").join("myapp.toml")
}

// FFI: CStr and CString
fn call_c_api(name: &str) {
    let c_name = CString::new(name).expect("no nulls in name");
    unsafe { some_c_function(c_name.as_ptr()) }
}

That's it. Four pairs, four contexts, one consistent borrowed/owned split across all of them. It looks like a lot when you see all eight types listed together, but in any given function you're usually working with just one pair. The type system guides you to the right one, and the compiler tells you when you've picked wrong.