Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Binary Search Tree

A binary tree with a superpower: left < node < right, always. This one rule makes search O(log n).

Think of a dictionary. You do not start at page 1 and flip forward one page at a time to find “umbrella”. You open to the middle — if your word comes alphabetically before the middle word, you go left; if after, you go right. Each step cuts the remaining pages in half. A Binary Search Tree works exactly this way, but built from pointers instead of pages.

The BST property

For every node in the tree:

  • All values in its left subtree are strictly less than the node’s value.
  • All values in its right subtree are strictly greater than the node’s value.

This must hold not just for immediate children but for the entire subtree below each node.

flowchart TD
    Root["8"]
    Root -->|"all < 8"| Left["3"]
    Root -->|"all > 8"| Right["10"]
    Left -->|"all < 3"| LL["1"]
    Left -->|"3 < x < 8"| LR["6"]
    LR --> LRL["4"]
    LR --> LRR["7"]
    Right --> RR["14"]
    RR -->|"left"| RRL["13"]

Check any node and the rule holds: node 6 has left child 4 (which is < 6) and right child 7 (which is > 6). Node 8 has an entire left subtree of values 1, 3, 4, 6, 7 — all less than 8 — and a right subtree of 10, 13, 14 — all greater than 8.

Building a BST manually

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None


# Build the BST from the diagram above
root = TreeNode(8)
root.left = TreeNode(3)
root.right = TreeNode(10)
root.left.left = TreeNode(1)
root.left.right = TreeNode(6)
root.left.right.left = TreeNode(4)
root.left.right.right = TreeNode(7)
root.right.right = TreeNode(14)
root.right.right.left = TreeNode(13)

# Verify the property holds for the root
print("Root:", root.value)
print("Left child (must be < 8):", root.left.value)
print("Right child (must be > 8):", root.right.value)
print("Left subtree's right child (must be > 3 and < 8):", root.left.right.value)

Searching: narrowing down at every step

Search in a BST follows the same logic as flipping to the middle of a dictionary. At each node you ask one question: is my target smaller, larger, or equal to this node’s value?

flowchart TD
    Root["8  ← start here: is 4 < 8? Yes → go left"]
    Root --> Left["3  ← is 4 < 3? No. is 4 > 3? Yes → go right"]
    Root --> Right["10  (skipped)"]
    Left --> LL["1  (skipped)"]
    Left --> LR["6  ← is 4 < 6? Yes → go left"]
    LR --> LRL["4  ← found it!"]
    LR --> LRR["7  (skipped)"]

We searched a tree of 9 nodes but only visited 4 of them: 8 → 3 → 6 → 4. At each step we discarded half (roughly) of the remaining tree. That is why search is O(log n) — the same reason binary search on a sorted array is O(log n).

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None


def search(node, target):
    """
    Returns the node if found, otherwise None.
    At each step we go left or right, never both.
    """
    if node is None:
        return None           # ran off the bottom — not in tree
    if target == node.value:
        return node           # found it
    if target < node.value:
        return search(node.left, target)   # must be in left subtree
    else:
        return search(node.right, target)  # must be in right subtree


# Build a small BST
root = TreeNode(8)
root.left = TreeNode(3)
root.right = TreeNode(10)
root.left.left = TreeNode(1)
root.left.right = TreeNode(6)
root.left.right.left = TreeNode(4)
root.left.right.right = TreeNode(7)

result = search(root, 4)
print("Search for 4:", result.value if result else "not found")

result = search(root, 5)
print("Search for 5:", result.value if result else "not found")

result = search(root, 1)
print("Search for 1:", result.value if result else "not found")

Iterative search (no recursion)

The recursive version is clean, but it is equally natural as a loop — and avoids stack frames for very deep trees:

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None


def search_iterative(root, target):
    """Walk down the tree until we find the value or fall off the edge."""
    node = root
    steps = 0
    while node is not None:
        steps += 1
        if target == node.value:
            print(f"Found {target} in {steps} step(s)")
            return node
        elif target < node.value:
            node = node.left
        else:
            node = node.right
    print(f"{target} not found after {steps} step(s)")
    return None


root = TreeNode(8)
root.left = TreeNode(3)
root.right = TreeNode(10)
root.left.left = TreeNode(1)
root.left.right = TreeNode(6)
root.left.right.left = TreeNode(4)
root.left.right.right = TreeNode(7)
root.right.right = TreeNode(14)
root.right.right.left = TreeNode(13)

search_iterative(root, 13)  # should take 4 steps: 8 → 10 → 14 → 13
search_iterative(root, 5)   # not found

Why it can go wrong: unbalanced trees

The O(log n) guarantee depends on the tree being roughly balanced — each step halving what remains. If you insert values in sorted order (1, 2, 3, 4, 5, ...) the BST degrades into a linked list and search becomes O(n):

flowchart TD
    N1["1"] --> N2["2"]
    N2 --> N3["3"]
    N3 --> N4["4"]
    N4 --> N5["5  (leaf)"]

This is why self-balancing variants like AVL trees and Red-Black trees exist — they automatically restructure the tree after insertions to keep it balanced. Python’s sortedcontainers.SortedList uses a similar technique internally.

Real-world uses

Database indexes — When you run SELECT * FROM users WHERE age > 30, the database engine does not scan every row. If there is a BST-based index on age, it walks directly to the first node where age = 31 and reads forward from there.

Auto-complete — A BST (or its cousin, a trie) stores every known word. When you type “pre”, the search engine walks to the subtree of words that start with “pre” and returns them instantly.

Symbol tables in compilers — When the compiler processes your code it needs to store every variable name and look it up fast. A BST keyed on the variable name gives O(log n) lookups and also supports printing all variables in alphabetical order (via in-order traversal).