Rust Deep Dives #1: HashMap's Entry API: Stop Doing Double Lookups
Post 1 of 8 in Rust Deep Dives. Companion series: Rust Patterns That Matter.
Next: #2: Deref Coercion
There's a thing you do with hash maps constantly: check if a key exists,
insert a default if it doesn't, then work with the value. In most
languages you'd write an if / else block or
call contains_key first. In Rust, the entry API gives you a
single-lookup way to do all of that. It's the most useful
HashMap pattern that nobody teaches early enough.
The problem: check-then-insert
Say you're counting word frequencies. The naive approach looks like this:
use std::collections::HashMap;
let mut counts: HashMap<&str, i32> = HashMap::new();
for word in text.split_whitespace() {
if counts.contains_key(word) {
let count = counts.get_mut(word).unwrap();
*count += 1;
} else {
counts.insert(word, 1);
}
}
This works. But it's ugly for two reasons. First, it does two hash lookups
per iteration: one for contains_key and another for
get_mut or insert. The map hashes the key,
finds the bucket, checks for a match -- and then you immediately do all
of that again on the same key. Second, there's an unwrap
hiding in there. You know it won't panic because you just checked
contains_key, but the compiler doesn't. It's a logic
invariant that isn't enforced by the type system, and that's the kind of
thing that rots over time as code evolves.
The entry API
map.entry(key) does a single lookup and returns an
Entry enum. That enum is either Occupied (the
key exists) or Vacant (it doesn't). You then decide what to
do based on which variant you got -- but the lookup already happened. No
double hashing. No unwrap.
use std::collections::hash_map::Entry;
match counts.entry(word) {
Entry::Occupied(mut entry) => {
*entry.get_mut() += 1;
}
Entry::Vacant(entry) => {
entry.insert(1);
}
}
One lookup. No unwrap. The compiler knows exactly which
variant you're in. But for common patterns you don't need to write the
full match. The entry API has convenience methods that
collapse the whole thing into a single expression.
The four methods
or_insert
The most common one. Insert a value if the key is absent, then return a mutable reference to the value (whether it was just inserted or already existed). The word-counting example becomes:
for word in text.split_whitespace() {
let count = counts.entry(word).or_insert(0);
*count += 1;
}
That's it. One line to handle both cases. If the key isn't in the map,
it inserts 0. Either way you get a &mut i32
back, and you increment it. Some people write it even shorter:
for word in text.split_whitespace() {
*counts.entry(word).or_insert(0) += 1;
}
Same thing. The or_insert call returns
&mut i32, and you dereference and add in place.
or_insert_with
Same idea, but the default value comes from a closure that only runs when the key is absent. Use this when creating the default is expensive -- you don't want to pay the cost if the key already exists.
use std::collections::HashMap;
struct Connection {
host: String,
port: u16,
}
impl Connection {
fn new(host: &str, port: u16) -> Self {
println!("Connecting to {host}:{port}...");
Connection { host: host.to_string(), port }
}
}
let mut pool: HashMap<String, Connection> = HashMap::new();
let conn = pool
.entry("db-primary".to_string())
.or_insert_with(|| Connection::new("10.0.0.1", 5432));
If "db-primary" is already in the pool, the closure never
runs. No connection is opened. No side effects. The
or_insert version would create the connection eagerly every
time -- even when the key is already present and the connection gets
thrown away. With or_insert_with, you only pay for what you
use.
There's also or_insert_with_key, which passes the key to the
closure. Handy when the default value depends on the key itself:
let mut cache: HashMap<String, String> = HashMap::new();
let value = cache
.entry("user:42".to_string())
.or_insert_with_key(|key| {
println!("Cache miss for {key}, fetching...");
fetch_from_database(key)
});
or_default
When the type implements Default, you don't even need to
specify the value. or_default() calls
Default::default() for you. For numbers that's 0.
For String it's "". For Vec it's
an empty vector.
let mut char_counts: HashMap<char, i32> = HashMap::new();
for ch in "hello world".chars() {
*char_counts.entry(ch).or_default() += 1;
}
This is the cleanest version for counting patterns. No explicit default value, no closure -- just "give me the existing value or a fresh default." It works nicely for grouping too:
let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();
for (key, value) in data {
groups.entry(key).or_default().push(value);
}
Vec::default() returns an empty Vec, so each
new key starts with an empty list. Then .push(value) appends
to it. No need for or_insert(Vec::new()) or
or_insert_with(Vec::new), though both would work.
and_modify
The previous methods focus on what to do when the key is missing. But
what if you want to do something to the existing value and
provide a default for when it's absent? That's
and_modify chained with one of the or_
methods:
let mut scores: HashMap<&str, i32> = HashMap::new();
for (player, points) in results {
scores.entry(player)
.and_modify(|score| *score += points)
.or_insert(points);
}
If the player already has a score, add the new points to it. If this is
the first time we see the player, start them off with whatever they just
scored. The and_modify closure only runs on occupied entries.
The or_insert value only applies to vacant entries. Together
they cover both cases in one expression.
Here's another realistic one -- tracking the first and last time you see an event:
use std::time::Instant;
struct Timestamps {
first_seen: Instant,
last_seen: Instant,
count: u64,
}
let mut events: HashMap<String, Timestamps> = HashMap::new();
let now = Instant::now();
events.entry(event_name)
.and_modify(|ts| {
ts.last_seen = now;
ts.count += 1;
})
.or_insert(Timestamps {
first_seen: now,
last_seen: now,
count: 1,
});
Working with the Entry enum directly
The convenience methods handle most situations, but sometimes you need
different logic for the occupied and vacant cases that doesn't fit the
"insert a default" pattern. In those cases, match on the
Entry enum directly.
use std::collections::hash_map::Entry;
fn register_user(
users: &mut HashMap<String, UserInfo>,
name: String,
info: UserInfo,
) -> Result<(), String> {
match users.entry(name) {
Entry::Occupied(entry) => {
Err(format!("User '{}' already exists", entry.key()))
}
Entry::Vacant(entry) => {
entry.insert(info);
Ok(())
}
}
}
The Occupied variant gives you an
OccupiedEntry with methods like .get(),
.get_mut(), .into_mut(),
.remove(), and .key(). The
Vacant variant gives you a VacantEntry with
.insert() and .key(). You get full control
without ever doing a second lookup.
Here's a more involved example -- an LRU-like cache that evicts the old entry when a key already exists:
fn update_or_insert(
cache: &mut HashMap<String, CachedValue>,
key: String,
value: CachedValue,
) -> Option<CachedValue> {
match cache.entry(key) {
Entry::Occupied(mut entry) => {
// Replace the old value, return it to the caller
let old = entry.insert(value);
Some(old)
}
Entry::Vacant(entry) => {
entry.insert(value);
None
}
}
}
The OccupiedEntry::insert method replaces the value and
returns the old one -- something you can't do with the convenience
methods alone. When you need to inspect, replace, or remove existing
entries, matching on the enum is the way to go.
Real patterns
Word frequency counting
The classic. You've already seen it above, but here's the complete version with sorted output:
use std::collections::HashMap;
fn word_frequencies(text: &str) -> Vec<(String, usize)> {
let mut counts: HashMap<String, usize> = HashMap::new();
for word in text.split_whitespace() {
let normalized = word.to_lowercase();
*counts.entry(normalized).or_default() += 1;
}
let mut result: Vec<_> = counts.into_iter().collect();
result.sort_by(|a, b| b.1.cmp(&a.1));
result
}
Grouping items by key
You have a flat list and want to group it into buckets. This comes up all the time -- grouping log entries by severity, users by role, events by date:
struct LogEntry {
level: String,
message: String,
}
fn group_by_level(logs: Vec<LogEntry>) -> HashMap<String, Vec<String>> {
let mut grouped: HashMap<String, Vec<String>> = HashMap::new();
for entry in logs {
grouped
.entry(entry.level)
.or_default()
.push(entry.message);
}
grouped
}
The or_default() creates an empty Vec for each
new level, and .push() appends to it. Three lines in the
loop body, and one of those is just a closing brace.
Caching computations
Sometimes you need to compute something expensive and reuse the result. The entry API makes this straightforward:
use std::collections::HashMap;
struct Solver {
cache: HashMap<u64, u64>,
}
impl Solver {
fn fibonacci(&mut self, n: u64) -> u64 {
if n <= 1 {
return n;
}
if let Some(&cached) = self.cache.get(&n) {
return cached;
}
let result = self.fibonacci(n - 1) + self.fibonacci(n - 2);
self.cache.insert(n, result);
result
}
}
Note that we can't use .entry().or_insert_with() here
because the closure would need to call self.fibonacci(),
which requires a mutable borrow on self -- but
.entry() already holds one. The borrow checker catches
this. For recursive memoization you fall back to separate
get and insert calls. But for non-recursive
caching, the entry API works perfectly:
fn get_config(
cache: &mut HashMap<String, Config>,
name: &str,
) -> &Config {
cache
.entry(name.to_string())
.or_insert_with(|| load_config_from_disk(name))
}
Building adjacency lists
Graphs represented as adjacency lists are a natural fit. Each node maps to a list of its neighbors:
fn build_graph(edges: &[(u32, u32)]) -> HashMap<u32, Vec<u32>> {
let mut graph: HashMap<u32, Vec<u32>> = HashMap::new();
for &(from, to) in edges {
graph.entry(from).or_default().push(to);
graph.entry(to).or_default().push(from); // undirected
}
graph
}
Two lines per edge. Each call to .entry().or_default()
ensures the node exists in the map, and .push() adds the
neighbor. If you want a directed graph, drop the second line. The
pattern is the same either way.
Deduplication with first-seen wins
Sometimes you want to keep the first value for each key and ignore duplicates. The entry API makes the intent explicit:
let mut first_seen: HashMap<String, Record> = HashMap::new();
for record in records {
first_seen
.entry(record.id.clone())
.or_insert(record);
}
If the key is already present, or_insert does nothing.
The first record wins. No conditional, no explicit check.
The connection to naming conventions
If you've spent time with Option or Result,
the entry API's method names look familiar.
or_insert follows the same _or pattern as
unwrap_or: provide a ready-made value.
or_insert_with follows the _or_else pattern:
provide a closure that computes the value lazily.
or_default mirrors unwrap_or_default: use the
type's Default implementation.
This isn't a coincidence. Rust's standard library uses these naming
suffixes consistently across types. Once you recognize the
_or / _or_else / _or_default
pattern, you can predict method names on types you haven't even seen
yet. or_insert is the eager version -- the value is always
evaluated. or_insert_with is the lazy version -- the
closure only runs when needed. Same principle as
unwrap_or vs unwrap_or_else, just applied to
map entries instead of option values.
The and_modify method follows a different pattern --
and_ means "do this if the thing exists." It's the
counterpart to or_, which means "do this if it doesn't."
You see the same split in Option's and_then
(operate on the value if Some) vs or_else
(provide a fallback if None). The naming is a system, not
a collection of arbitrary words.
The entry API is one of those things that feels like a minor convenience
at first and then quietly becomes the default way you interact with hash
maps. One lookup instead of two. No unwrap. The compiler
verifies your logic. Once it clicks, you reach for it without thinking.
Telex