Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Find Median From Data Stream

Difficulty: Hard Source: NeetCode

Problem

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.

Implement the MedianFinder class:

  • MedianFinder() — initializes the MedianFinder object.
  • void addNum(int num) — adds the integer num from the data stream to the data structure.
  • double findMedian() — returns the median of all elements so far. Answers within 10^-5 of the actual answer will be accepted.

Example 1: Input: addNum(1), addNum(2), findMedian()1.5, addNum(3), findMedian()2.0

Constraints:

  • -10^5 <= num <= 10^5
  • There will be at least one element before findMedian is called
  • At most 5 * 10^4 calls will be made to addNum and findMedian

Prerequisites

Before attempting this problem, you should be comfortable with:

  • Two heaps pattern — splitting a dataset at its median using a max-heap for the lower half and a min-heap for the upper half
  • Heap balancing — ensuring both halves are within 1 element of each other
  • Python’s heapq module — min-heap only; negate values for max-heap behavior

1. Brute Force — Sorted List

Intuition

Maintain a sorted list of all elements seen so far. Insertion into a sorted list is O(n) (shift elements), but findMedian is O(1) — just index into the middle. Simple but slow for large streams.

Algorithm

  1. Store elements in a sorted list nums.
  2. addNum(num): use bisect.insort to insert in sorted position.
  3. findMedian(): if odd length, return middle element; if even, return average of two middle elements.

Solution

import bisect

class MedianFinder:
    def __init__(self):
        self.nums = []

    def addNum(self, num):
        bisect.insort(self.nums, num)

    def findMedian(self):
        n = len(self.nums)
        if n % 2 == 1:
            return float(self.nums[n // 2])
        else:
            return (self.nums[n // 2 - 1] + self.nums[n // 2]) / 2.0


mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian())  # 1.5
mf.addNum(3)
print(mf.findMedian())  # 2.0

Complexity

  • Time: O(n) per addNum (insertion shift), O(1) per findMedian
  • Space: O(n)

2. Two Heaps

Intuition

Split the data into two halves at the median:

  • small: a max-heap holding the lower half (negate values for Python’s min-heap)
  • large: a min-heap holding the upper half

Keep them balanced: len(small) == len(large) or len(small) == len(large) + 1 (small gets the extra element when the total is odd).

The median is:

  • If same size: average of small[0] and large[0]
  • If small has one extra: small[0] (which is the max of the lower half)

For every addNum, push to small, then balance by moving the largest of small to large if needed, then re-balance size if large is bigger than small.

Algorithm

  1. addNum(num): a. Push -num to small (max-heap via negation). b. If small has elements and large has elements and -small[0] > large[0]: move small’s max to large. c. If len(large) > len(small): move large’s min to small.
  2. findMedian():
    • If equal sizes: (-small[0] + large[0]) / 2.0
    • If small is larger: -small[0]

Solution

import heapq

class MedianFinder:
    def __init__(self):
        self.small = []  # max-heap (negated) — lower half
        self.large = []  # min-heap — upper half

    def addNum(self, num):
        # Always push to small first
        heapq.heappush(self.small, -num)

        # Ensure every element in small <= every element in large
        if self.small and self.large and (-self.small[0] > self.large[0]):
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)

        # Balance sizes: small can have at most 1 extra
        if len(self.small) > len(self.large) + 1:
            val = -heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        elif len(self.large) > len(self.small):
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -val)

    def findMedian(self):
        if len(self.small) == len(self.large):
            return (-self.small[0] + self.large[0]) / 2.0
        return float(-self.small[0])


# Test 1
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian())  # 1.5
mf.addNum(3)
print(mf.findMedian())  # 2.0

# Test 2
mf2 = MedianFinder()
mf2.addNum(6)
print(mf2.findMedian())  # 6.0
mf2.addNum(10)
print(mf2.findMedian())  # 8.0
mf2.addNum(2)
print(mf2.findMedian())  # 6.0
mf2.addNum(6)
print(mf2.findMedian())  # 6.0

# Test 3 — single element
mf3 = MedianFinder()
mf3.addNum(-1)
print(mf3.findMedian())  # -1.0
mf3.addNum(-2)
print(mf3.findMedian())  # -1.5

Complexity

  • Time: O(log n) per addNum, O(1) per findMedian
  • Space: O(n)

Common Pitfalls

Forgetting to negate when pushing to or popping from small. The max-heap is simulated by negating all values. When you push num to small, push -num. When you pop from small, negate the result to get the actual value.

Not re-balancing after the ordering fix. The two-step rebalance matters: first fix the cross-heap ordering (ensure max of small ≤ min of large), then fix the sizes. If you only do one or the other, the invariant can be violated.

Returning an int instead of a float. The problem asks for a double. In Python, (a + b) / 2 returns a float if either is a float, but -small[0] returns an int. Wrap with float() or use / 2.0.