Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Heapify

Why build a heap one element at a time in O(n log n) when you can do it in O(n)?

If you already have all the data up front, there is a smarter approach called heapify. Instead of pushing each element individually and sifting it up, you start from the middle of the array and sift down each node. The result is a valid heap in linear time.

Why Start From the Middle?

Leaf nodes (roughly the second half of the array) are already valid one-element heaps — there is nothing to do for them. The first internal node that can actually have children is at index (n // 2) - 1. That is where the work begins.

For an array of length 6, index (6 // 2) - 1 = 2 is the last internal node. We sift down from index 2, then 1, then 0.

Step-by-Step: Heapify [3, 1, 6, 5, 2, 4]

flowchart TD
    subgraph Start["Starting array: 3, 1, 6, 5, 2, 4"]
        N0["3\n idx 0"] --> N1["1\n idx 1"]
        N0 --> N2["6\n idx 2"]
        N1 --> N3["5\n idx 3"]
        N1 --> N4["2\n idx 4"]
        N2 --> N5["4\n idx 5"]
    end

    subgraph Step1["Step 1: sift down idx 2 (value 6) — child 4 is smaller, swap"]
        S1N0["3"] --> S1N1["1"]
        S1N0 --> S1N2["4 ↑"]
        S1N1 --> S1N3["5"]
        S1N1 --> S1N4["2"]
        S1N2 --> S1N5["6 ↓"]
    end

    subgraph Step2["Step 2: sift down idx 1 (value 1) — children are 5 and 2; 1 < both, no swap"]
        S2N0["3"] --> S2N1["1"]
        S2N0 --> S2N2["4"]
        S2N1 --> S2N3["5"]
        S2N1 --> S2N4["2"]
        S2N2 --> S2N5["6"]
    end

    subgraph Step3["Step 3: sift down idx 0 (value 3) — children are 1 and 4; 1 is smallest, swap"]
        S3N0["1 ↑"] --> S3N1["3 ↓"]
        S3N0 --> S3N2["4"]
        S3N1 --> S3N3["5"]
        S3N1 --> S3N4["2"]
        S3N2 --> S3N5["6"]
    end

    subgraph Step4["Step 4: continue sifting 3 down — children 5 and 2; swap with 2"]
        S4N0["1"] --> S4N1["2 ↑"]
        S4N0 --> S4N2["4"]
        S4N1 --> S4N3["5"]
        S4N1 --> S4N4["3 ↓"]
        S4N2 --> S4N5["6"]
    end

    Start --> Step1 --> Step2 --> Step3 --> Step4

Final heap array: [1, 2, 4, 5, 3, 6]

Implementing Heapify From Scratch

def sift_down(arr, i, n):
    """Sift the element at index i down until the heap property holds."""
    while True:
        smallest = i
        left  = 2 * i + 1
        right = 2 * i + 2
        if left < n and arr[left] < arr[smallest]:
            smallest = left
        if right < n and arr[right] < arr[smallest]:
            smallest = right
        if smallest == i:
            break
        arr[i], arr[smallest] = arr[smallest], arr[i]
        i = smallest

def heapify(arr):
    """Turn arr into a valid min-heap in-place in O(n) time."""
    n = len(arr)
    # Start from the last internal node and work backwards to the root
    for i in range(n // 2 - 1, -1, -1):
        sift_down(arr, i, n)

# --- Demo ---
data = [3, 1, 6, 5, 2, 4]
print("Before heapify:", data)
heapify(data)
print("After heapify: ", data)

# Verify: extract elements in sorted order
import heapq
import copy
heap_copy = copy.copy(data)
sorted_vals = []
while heap_copy:
    # manually pop using our sift_down
    heap_copy[0], heap_copy[-1] = heap_copy[-1], heap_copy[0]
    sorted_vals.append(heap_copy.pop())
    sift_down(heap_copy, 0, len(heap_copy))
print("Elements in min order:", sorted_vals)

Using Python’s heapq.heapify()

heapq.heapify() does the same thing in-place, transforming any list into a min-heap:

import heapq

data = [3, 1, 6, 5, 2, 4]
print("Before:", data)

heapq.heapify(data)   # O(n), in-place
print("After: ", data)

# heapq.heapify guarantees data[0] is the minimum
print("Minimum:", data[0])

# Pull all elements out in sorted order
print("Sorted:", [heapq.heappop(data) for _ in range(len(data) + 0)])

Why O(n) and Not O(n log n)?

It feels like it should be O(n log n) — we call sift_down on n/2 nodes, and sift_down is O(log n). But not every sift_down call does the same amount of work.

  • Nodes near the bottom of the tree have very little room to sift down (height 0 or 1).
  • Only the root can sift down the full tree height.

When you add up the actual work across all levels, the total converges to O(n). The formal proof uses the fact that there are at most n/2^(h+1) nodes at height h, each doing O(h) work — and the sum ∑ h / 2^h converges to a constant.

MethodTimeWhen to use
Push n timesO(n log n)Elements arrive one at a time
heapify onceO(n)All data available at the start

Real-World Applications

Heap Sort: heapify the array in O(n), then repeatedly extract the minimum in O(log n). Total: O(n log n) with O(1) extra space — a fully in-place sort.

import heapq

def heap_sort(arr):
    heapq.heapify(arr)
    return [heapq.heappop(arr) for _ in range(len(arr))]

numbers = [42, 7, 19, 3, 55, 1, 28]
print("Unsorted:", numbers)
print("Sorted:  ", heap_sort(numbers))

One-time priority queue setup: when a system starts up with a known backlog of jobs (e.g. a build system loading thousands of pending tasks from a database), heapify transforms the entire backlog into a priority queue in linear time, ready for O(log n) push/pop from that point on.