Red-Black Tree

What is a Red-Black Tree?

A self-balancing BST where every node is red or black. The coloring rules guarantee O(log n) for search, insert, and delete.

The 5 Rules

  1. Every node is red or black.
  2. The root is always black.
  3. Every leaf (NIL) is black.
  4. Red rule: No two reds in a row (red node can't have a red child).
  5. Black-height rule: Every path from a node to its NIL descendants has the same number of black nodes.

Rule 5 prevents any path from being more than 2x longer than any other. Rule 4 is what triggers rebalancing after insertion.

The 3 Fixup Cases After Insertion

New nodes are inserted as red (inserting black always breaks rule 5; red only might break rule 4). When a double-red occurs, we fix it with 3 cases:

Decision Flowchart

Is parent red? No Done Yes Is uncle red? Yes Case 1: Recolor Repeat from grandparent No Is node inner child? (zig-zag) Yes Case 2: Rotate to straighten Case 3: Rotate + recolor Done No Case 3: Rotate + recolor Done

Case 1: Uncle is Red

Before
=>
After

What: Recolor parent and uncle to black, grandparent to red. Then check grandparent.

Why: Every path through grandparent passes through exactly one of its children (parent or uncle). By making grandparent red (-1 black on path) and both children black (+1 black on path), the total black-height stays the same on every path. Meanwhile, parent is now black, so the double-red with the new node is gone. The tradeoff: grandparent is now red, which may create a new double-red with its parent - so we repeat the check one level up.

Case 2: Uncle is Black, Node is Inner Child (Zig-Zag)

What is a rotation? (Cases 2 and 3 use them)

Recoloring alone can't always fix a violation - sometimes the tree is structurally lopsided. A rotation shifts one node up and another down, restructuring the local subtree while preserving BST order (left-smaller, right-larger).

The name tells you which direction the top node moves down. "Left-rotate X" means X drops down-left, its right child rises up. "Right-rotate X" means X drops down-right, its left child rises up.

Variant A: P is left child, N is right child of P

Before
=>
After (then Case 3)

Left-rotate P: N moves up, path straightens to zig-zig

Variant B: P is right child, N is left child of P

Before
=>
After (then Case 3)

Right-rotate P: N moves up, path straightens to zig-zig

What: If N is right child of left parent P, left-rotate P. If N is left child of right parent P, right-rotate P. Either way, N moves into P's position, straightening the path into Case 3.

Why: We can't fix this with a single rotation because the node is on the "inside" (the path bends). The rotation straightens the path into a zig-zig shape so Case 3's single rotation can finish the job.

Case 3: Uncle is Black, Node is Outer Child (Zig-Zig)

Variant A: P is left child, N is left child of P

Before
=>
After

Right-rotate G, recolor P to black, G to red

Variant B: P is right child, N is right child of P

Before
=>
After

Left-rotate G, recolor P to black, G to red

What: If P is left child of G, right-rotate G. If P is right child of G, left-rotate G. Then recolor P to black and G to red.

Why: The parent becomes the new subtree root (black), balancing the tree. The old grandparent (now red) drops down. This rotation preserves black-height on all paths while eliminating the double-red violation in one step.

Practice

or

Further Reading

A Brief History

1962 - The ancestor. Adelson-Velsky and Landis publish the AVL tree - the first self-balancing BST. It works, but requires many rotations on insert/delete because it demands strict balance. Researchers start looking for "good enough" balance that's cheaper to maintain.

1970 - B-trees arrive. Bayer and McCreight invent B-trees for database storage on disk. The idea of allowing some slack in the balance (nodes can be partially full) proves powerful - but B-trees are designed for disk blocks, not in-memory binary search.

1972 - The invention. Rudolf Bayer asks: what if we apply B-tree ideas to a binary tree in memory? He invents "symmetric binary B-trees" - essentially red-black trees without the colors. The data structure works beautifully, but the rules are complex and hard to teach.

1978 - The colors. At Xerox PARC, Leonidas Guibas and Robert Sedgewick reframe Bayer's rules using a simple coloring scheme: paint some edges red and others black, and the complicated invariants become counting rules anyone can follow. Why red and black? The lab had a brand-new laser printer that could print in those two colors, and their diagrams looked stunning. Sedgewick later joked they simply grabbed the two pens on the desk. The name "red-black tree" stuck.

1993 - Left-leaning variant. Sedgewick continues refining the idea for decades. In 2008, he publishes the left-leaning red-black tree - a simplified variant that restricts red links to left children only, cutting the number of cases roughly in half. Elegant for teaching, though the standard version remains dominant in production.

1999-2007 - World domination. Java 1.2 ships TreeMap backed by a red-black tree (1998). The Linux kernel adopts it for the Completely Fair Scheduler (2007), using it to pick which process runs next in O(log n). C++'s std::map and std::set have used it since the STL's inception. Virtually every operating system's virtual memory manager uses one to track memory regions.

Today. Red-black trees are everywhere you don't notice them. Every time you open a file, look up a route in a network, or insert a row into a database index, a red-black tree probably helped find it. They won not by being the theoretically optimal tree, but by being good enough at everything - fast inserts, fast deletes, fast lookups, bounded worst case - with an implementation simple enough to get right in production.

When to Use a Red-Black Tree

Red-Black Tree AVL Tree Hash Map
Balance Loosely balanced Strictly balanced N/A
Lookup O(log n) O(log n), faster in practice O(1) average
Insert/Delete Faster (fewer rotations) Slower (more rotations) O(1) average
Ordered iteration Yes Yes No
Use when Frequent inserts/deletes + need order (e.g. std::map, TreeMap, Linux CFS scheduler) Lookup-heavy, rarely changing data No ordering needed

What About Deletion?

Deletion follows a similar pattern but is more complex - it has 4 fixup cases instead of 3, and must handle the case where removing a black node breaks rule 5. The core idea is the same: recolor first, rotate if needed, propagate up if the violation persists. Master insertion first - deletion builds on the same intuitions.

The Family Tree of Balanced Trees

Red-black trees belong to a rich family of self-balancing structures. Here are the ones you'll encounter in the wild:

Structure Where you'll see it Key idea
Red-black tree Java TreeMap, C++ std::map/set, Linux CFS scheduler, virtual memory managers Loosely balanced with coloring rules. Fewer rotations than AVL. The default in-memory ordered container.
B-tree Every database (PostgreSQL, MySQL, SQLite), file systems (NTFS, HFS+, ext4) Wide nodes (many keys per node) minimize disk reads. The king of on-disk search.
B+ tree Database indexes, InnoDB, btrfs B-tree variant where all data lives in leaves, linked together. Great for range scans.
AVL tree In-memory databases, lookup-heavy caches Strictly balanced (height diff <= 1). Faster lookups, slower inserts than red-black.
Skip list Redis sorted sets, LevelDB/RocksDB memtables, Lucene Randomized linked list layers. No rotations needed - probabilistically balanced. Simpler concurrent access.
Splay tree Caches, garbage collectors, Windows NT virtual memory Recently accessed nodes bubble to the root. No balance guarantee per-op, but amortized O(log n). Great for skewed access patterns.
Treap Randomized algorithms, competitive programming, some language runtimes BST + heap priorities. Random priorities give expected O(log n) without deterministic balancing.
LSM tree Cassandra, RocksDB, LevelDB, BigTable Write-optimized. Buffers writes in memory (often a skip list), periodically merges to disk. Dominates write-heavy workloads.

The pattern: no single structure wins everywhere. B-trees dominate disk. Red-black trees dominate in-memory ordered containers. Skip lists win when you need concurrency without lock contention. LSM trees win for write-heavy workloads. Knowing which tool fits which constraint is more valuable than memorizing any single algorithm.

Should You Still Learn This in the AI Era?

What's changed: you'll likely never hand-write a red-black tree in production. AI can generate implementations, recall which case does what, and produce the boilerplate.

What still matters: knowing when a red-black tree is the right choice vs. a skip list vs. a B-tree. Understanding why a rotation preserves balance so you can debug when the system misbehaves. Reading a flame graph and recognizing "this is doing O(n) lookups where an ordered container would give O(log n)." AI is getting better at these too - but your ability to ask the right question depends on having the mental model.

The new skill: building intuition for tradeoffs - balance strictness vs. rotation cost, memory layout vs. cache performance, write amplification vs. read latency. That intuition comes from experiencing the mechanics at least once (use the Practice section above), not from memorizing case numbers.