Rust Deep Dives #4: Why Rust Has Six String Types (And When You'll Hit Each One)
Post 4 of 8 in Rust Deep Dives. Companion series: Rust Patterns That Matter.
Previous: #3: AsRef, Borrow, and ToOwned | Next: #5: Trait Patterns
Most languages give you one string type and call it a day. Python has str.
Go has string. JavaScript has String. Rust gives you
&str, String, OsStr, OsString,
CStr, CString, Path, and PathBuf.
That's eight types across four pairs. The first time you see them all, it feels like
overengineering. It isn't. Each pair exists because the world has different ideas about
what a "string" is, and Rust chose honesty over pretending.
Here's when each one shows up and why.
The core pair: &str and String
These are the two you use 90% of the time. &str is a borrowed, immutable
slice of UTF-8 bytes. Under the hood, it's just a pointer and a length -- the same two
words a C programmer would carry around. It doesn't own the data. It can point into a
String, into a string literal baked into your binary, or into any contiguous
UTF-8 buffer.
String is the owned, heap-allocated, growable version. It's backed by a
Vec<u8> that's guaranteed to contain valid UTF-8. You use it when you
need to build strings at runtime, store them in structs, or pass ownership across
function boundaries.
// &str: borrowed, immutable, no allocation
let greeting: &str = "hello"; // points into the binary's read-only data
// String: owned, growable, heap-allocated
let mut name = String::from("world");
name.push_str("!");
// String derefs to &str, so you can pass it anywhere &str is expected
fn greet(who: &str) {
println!("hello, {who}");
}
greet(&name); // &String coerces to &str
greet(greeting); // &str passed directly
The relationship between them is simple: String implements
Deref<Target = str>, so anywhere you accept &str,
you can pass a &String and it just works. This is why the standard
advice is to take &str in function signatures -- it accepts both types
without forcing the caller to allocate.
If you're coming from C, think of &str as a fat pointer to a
char buffer (with a length instead of a null terminator), and
String as a malloc'd buffer that knows its own length
and capacity. The key difference from C: both are guaranteed to be valid UTF-8. There
is no "maybe it's Latin-1, maybe it's UTF-8" ambiguity.
Why one pair isn't enough
If every string in the world were UTF-8, we'd be done. But they aren't.
File paths on Windows can contain arbitrary UTF-16 code units, including unpaired surrogates that have no valid UTF-8 representation. Environment variables on Unix are arbitrary byte sequences that don't have to be UTF-8 at all. C libraries use null-terminated byte arrays that may or may not be valid text in any encoding.
Most languages handle this by pretending the problem doesn't exist. Python 3 throws a
UnicodeDecodeError when it encounters a filename it can't decode. Go's
string is defined as a byte sequence that's "conventionally but not
necessarily" UTF-8, which means bugs hide until runtime. Rust chose a different approach:
separate types with separate guarantees, so the compiler catches encoding mismatches at
compile time instead of 3 AM.
The OS boundary: OsStr and OsString
OsStr and OsString are Rust's representation of
platform-native strings. On Unix, they're arbitrary byte sequences (the OS doesn't care
about encoding). On Windows, they're potentially ill-formed UTF-16 -- meaning they can
contain unpaired surrogates that no valid Unicode string could represent.
You hit these every time you interact with the operating system: reading environment
variables, listing directory entries, handling command-line arguments. The API functions
that return these types are the honest versions. There are convenience wrappers that
return String, but they panic or return errors when the OS hands back
something that isn't valid UTF-8.
use std::env;
use std::ffi::OsString;
// env::var() returns Result<String> -- panics-friendly but lossy
// env::var_os() returns Option<OsString> -- always works
fn get_home() -> Option<OsString> {
env::var_os("HOME")
}
The same borrowed/owned pattern applies: OsStr is the borrowed slice,
OsString is the owned, heap-allocated version. OsString
derefs to OsStr, just like String derefs to str.
Converting from OsStr to &str is fallible, because the OS
string might not be valid UTF-8. This is the whole point -- the type system forces you
to handle the possibility:
use std::ffi::OsStr;
fn print_env(key: &str) {
match std::env::var_os(key) {
Some(ref val) => {
// .to_str() returns Option<&str> -- None if not valid UTF-8
match val.to_str() {
Some(s) => println!("{key}={s}"),
None => println!("{key}=<not valid UTF-8>"),
}
}
None => println!("{key} is not set"),
}
}
If you know the string is valid UTF-8 (which it almost always is in practice), you can
use .to_str().unwrap() or .to_string_lossy() which replaces
invalid sequences with the Unicode replacement character. But the type system makes this
an explicit choice rather than a silent assumption.
The C boundary: CStr and CString
C strings are a different animal. They're null-terminated byte arrays with no length field and no interior null bytes. They're not necessarily UTF-8 -- they're whatever encoding the C library decided to use (often but not always UTF-8 on modern systems).
CStr is a borrowed reference to a null-terminated C string.
CString is an owned, heap-allocated null-terminated string. You use
them when calling C functions through FFI.
use std::ffi::{CStr, CString};
use std::os::raw::c_char;
extern "C" {
fn setenv(name: *const c_char, value: *const c_char, overwrite: i32) -> i32;
fn getenv(name: *const c_char) -> *const c_char;
}
fn safe_setenv(name: &str, value: &str) -> Result<(), std::ffi::NulError> {
// CString::new checks for interior null bytes.
// If your Rust string contains a \0, this returns Err.
let c_name = CString::new(name)?;
let c_value = CString::new(value)?;
unsafe {
setenv(c_name.as_ptr(), c_value.as_ptr(), 1);
}
Ok(())
}
The CString::new constructor is worth understanding. It takes any byte
sequence, appends a null terminator, and checks that the input contains no interior
null bytes. Interior nulls are fatal in C -- strlen would stop at the
first one, silently truncating your string. Rust catches this at construction time.
// This works: no interior nulls
let ok = CString::new("hello");
assert!(ok.is_ok());
// This fails: interior null byte at position 5
let bad = CString::new("hello\0world");
assert!(bad.is_err());
Going the other direction -- reading a C string back into Rust -- you use
CStr:
fn safe_getenv(name: &str) -> Option<String> {
let c_name = CString::new(name).ok()?;
unsafe {
let ptr = getenv(c_name.as_ptr());
if ptr.is_null() {
return None;
}
// CStr::from_ptr reads until the null terminator
let c_str = CStr::from_ptr(ptr);
// .to_str() checks for valid UTF-8
c_str.to_str().ok().map(String::from)
}
}
If you're coming from C, this is the type pair that will feel most familiar.
CStr is a safe wrapper around const char*, and
CString is a safe wrapper around a malloc'd
char* with automatic cleanup. The difference from raw C: Rust
guarantees no interior nulls, tracks ownership, and makes encoding conversion explicit.
The filesystem: Path and PathBuf
Path and PathBuf are OsStr and
OsString with path-specific methods bolted on. Internally, a
Path is just a newtype wrapper around OsStr, and
PathBuf wraps OsString. The reason they exist as separate
types is ergonomics: paths have operations that plain OS strings don't -- joining
components, extracting extensions, navigating parent directories.
use std::path::{Path, PathBuf};
let config = Path::new("/home/user/.config/app/settings.toml");
// Navigate the path
let dir = config.parent(); // Some("/home/user/.config/app")
let file = config.file_name(); // Some("settings.toml")
let stem = config.file_stem(); // Some("settings")
let ext = config.extension(); // Some("toml")
// Check properties
let is_abs = config.is_absolute(); // true
let exists = config.exists(); // checks the filesystem
let starts = config.starts_with("/home"); // true (component-aware)
The .join() method is where PathBuf really earns its keep.
It handles path separators correctly on every platform, which is something you do not
want to do by hand with string concatenation:
use std::path::PathBuf;
// Building paths: .join() handles separators for you
let mut log_dir = PathBuf::from("/var/log");
log_dir.push("myapp");
log_dir.push("debug.log");
// log_dir is now "/var/log/myapp/debug.log"
// Or more concisely:
let log_path = Path::new("/var/log").join("myapp").join("debug.log");
// Changing components
let mut p = PathBuf::from("/tmp/report.csv");
p.set_extension("json");
// p is now "/tmp/report.json"
Why not just use String for paths? Three reasons. First, file paths
aren't always valid UTF-8 (the same reason OsStr exists). Second,
path operations are platform-specific -- / vs \, drive
letters, UNC paths on Windows -- and Path handles all of it correctly.
Third, starts_with on a Path is component-aware: it checks
whole path components, not just string prefixes.
use std::path::Path;
// String prefix matching gets this wrong:
let s = "/home/user2/docs";
assert!(s.starts_with("/home/user")); // true! But user != user2
// Path component matching gets it right:
let p = Path::new("/home/user2/docs");
assert!(!p.starts_with("/home/user")); // false -- correctly
That distinction alone is a security-relevant reason to use Path
instead of string manipulation for filesystem operations.
The conversion map
Conversions between these types follow a consistent pattern. Going from more-specific to less-specific is cheap or free. Going from less-specific to more-specific is fallible, because you're adding a guarantee (like "this is valid UTF-8") that might not hold.
String / &str to other types
use std::ffi::{OsStr, OsString, CString};
use std::path::{Path, PathBuf};
let s: &str = "hello.txt";
let owned: String = String::from("hello.txt");
// To OsStr/OsString -- always succeeds (UTF-8 is valid OS string)
let os: &OsStr = OsStr::new(s);
let os_owned: OsString = OsString::from(owned.clone());
// To Path/PathBuf -- always succeeds (any string is a valid path)
let path: &Path = Path::new(s);
let path_owned: PathBuf = PathBuf::from(owned.clone());
// To CString -- fallible (interior nulls are not allowed)
let c: CString = CString::new(s).expect("no interior nulls");
OsStr / OsString to other types
use std::ffi::OsStr;
use std::path::Path;
let os: &OsStr = OsStr::new("data.bin");
// To &str -- fallible (might not be valid UTF-8)
let s: Option<&str> = os.to_str();
// To String -- lossy option (replaces invalid bytes with U+FFFD)
let lossy: String = os.to_string_lossy().into_owned();
// To &Path -- always succeeds (OsStr is the inner type of Path)
let path: &Path = Path::new(os);
CStr to other types
use std::ffi::CStr;
let c: &CStr = unsafe { CStr::from_ptr(some_c_pointer) };
// To &str -- fallible (C string might not be valid UTF-8)
let s: Result<&str, _> = c.to_str();
// To String -- lossy option
let lossy: String = c.to_string_lossy().into_owned();
// To &[u8] -- always succeeds (raw bytes without the null terminator)
let bytes: &[u8] = c.to_bytes();
Path / PathBuf to other types
use std::path::Path;
use std::ffi::OsStr;
let path = Path::new("/tmp/data.csv");
// To &OsStr -- always succeeds (Path is a wrapper around OsStr)
let os: &OsStr = path.as_os_str();
// To &str -- fallible (path might not be valid UTF-8)
let s: Option<&str> = path.to_str();
// To String -- via to_str or lossy
let lossy: String = path.to_string_lossy().into_owned();
// display() for user-facing output (also lossy)
println!("reading from: {}", path.display());
The pattern is always the same. If both types guarantee UTF-8, conversion is free. If
you're going from "maybe UTF-8" to "definitely UTF-8," you get a Result
or Option. The types encode what the compiler can and can't prove about
your data.
The practical rule
You don't need to memorize all of this. Here's the decision tree that covers real-world code:
Most of the time, you use &str and String. They cover
user-facing text, JSON, configuration, log messages, HTTP bodies -- anything that's
text in the UTF-8 sense. Accept &str in function parameters. Store
String in structs.
When you touch the filesystem, you switch to Path and PathBuf.
Don't build file paths with format! or string concatenation. Use
Path::new() and .join(). Your code becomes portable and
correct on platforms where paths aren't UTF-8.
// Don't do this:
let path = format!("{}/{}/{}", base_dir, subdir, filename);
// Do this:
let path = Path::new(&base_dir).join(&subdir).join(&filename);
When you call C functions through FFI, you use CStr and
CString. They handle the null terminator and the interior-null check
so you don't have to think about it in unsafe blocks. You'll encounter these when
wrapping C libraries or interacting with system APIs that don't have Rust-native
bindings yet.
When you need raw OS strings -- the ones that might not be UTF-8 and aren't
file paths -- you use OsStr and OsString. In practice,
this is mostly environment variables (via var_os) and the raw forms
of path components. You rarely create these yourself; they come to you from the
standard library's OS interfaces.
use std::path::PathBuf;
use std::ffi::CString;
// Text: &str and String
fn process_message(msg: &str) -> String {
format!("[INFO] {msg}")
}
// Files: Path and PathBuf
fn config_path() -> PathBuf {
let home = std::env::var("HOME").unwrap_or_else(|_| "/tmp".into());
PathBuf::from(home).join(".config").join("myapp.toml")
}
// FFI: CStr and CString
fn call_c_api(name: &str) {
let c_name = CString::new(name).expect("no nulls in name");
unsafe { some_c_function(c_name.as_ptr()) }
}
That's it. Four pairs, four contexts, one consistent borrowed/owned split across all of them. It looks like a lot when you see all eight types listed together, but in any given function you're usually working with just one pair. The type system guides you to the right one, and the compiler tells you when you've picked wrong.
Telex