Find Median From Data Stream
Difficulty: Hard Source: NeetCode
Problem
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value and the median is the mean of the two middle values.
Implement the
MedianFinderclass:
MedianFinder()— initializes theMedianFinderobject.void addNum(int num)— adds the integernumfrom the data stream to the data structure.double findMedian()— returns the median of all elements so far. Answers within10^-5of the actual answer will be accepted.Example 1: Input:
addNum(1),addNum(2),findMedian()→1.5,addNum(3),findMedian()→2.0Constraints:
-10^5 <= num <= 10^5- There will be at least one element before
findMedianis called- At most
5 * 10^4calls will be made toaddNumandfindMedian
Prerequisites
Before attempting this problem, you should be comfortable with:
- Two heaps pattern — splitting a dataset at its median using a max-heap for the lower half and a min-heap for the upper half
- Heap balancing — ensuring both halves are within 1 element of each other
- Python’s
heapqmodule — min-heap only; negate values for max-heap behavior
1. Brute Force — Sorted List
Intuition
Maintain a sorted list of all elements seen so far. Insertion into a sorted list is O(n) (shift elements), but findMedian is O(1) — just index into the middle. Simple but slow for large streams.
Algorithm
- Store elements in a sorted list
nums. addNum(num): usebisect.insortto insert in sorted position.findMedian(): if odd length, return middle element; if even, return average of two middle elements.
Solution
import bisect
class MedianFinder:
def __init__(self):
self.nums = []
def addNum(self, num):
bisect.insort(self.nums, num)
def findMedian(self):
n = len(self.nums)
if n % 2 == 1:
return float(self.nums[n // 2])
else:
return (self.nums[n // 2 - 1] + self.nums[n // 2]) / 2.0
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian()) # 1.5
mf.addNum(3)
print(mf.findMedian()) # 2.0
Complexity
- Time:
O(n)peraddNum(insertion shift),O(1)perfindMedian - Space:
O(n)
2. Two Heaps
Intuition
Split the data into two halves at the median:
small: a max-heap holding the lower half (negate values for Python’s min-heap)large: a min-heap holding the upper half
Keep them balanced: len(small) == len(large) or len(small) == len(large) + 1 (small gets the extra element when the total is odd).
The median is:
- If same size: average of
small[0]andlarge[0] - If small has one extra:
small[0](which is the max of the lower half)
For every addNum, push to small, then balance by moving the largest of small to large if needed, then re-balance size if large is bigger than small.
Algorithm
addNum(num): a. Push-numtosmall(max-heap via negation). b. Ifsmallhas elements andlargehas elements and-small[0] > large[0]: movesmall’s max tolarge. c. Iflen(large) > len(small): movelarge’s min tosmall.findMedian():- If equal sizes:
(-small[0] + large[0]) / 2.0 - If
smallis larger:-small[0]
- If equal sizes:
Solution
import heapq
class MedianFinder:
def __init__(self):
self.small = [] # max-heap (negated) — lower half
self.large = [] # min-heap — upper half
def addNum(self, num):
# Always push to small first
heapq.heappush(self.small, -num)
# Ensure every element in small <= every element in large
if self.small and self.large and (-self.small[0] > self.large[0]):
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
# Balance sizes: small can have at most 1 extra
if len(self.small) > len(self.large) + 1:
val = -heapq.heappop(self.small)
heapq.heappush(self.large, val)
elif len(self.large) > len(self.small):
val = heapq.heappop(self.large)
heapq.heappush(self.small, -val)
def findMedian(self):
if len(self.small) == len(self.large):
return (-self.small[0] + self.large[0]) / 2.0
return float(-self.small[0])
# Test 1
mf = MedianFinder()
mf.addNum(1)
mf.addNum(2)
print(mf.findMedian()) # 1.5
mf.addNum(3)
print(mf.findMedian()) # 2.0
# Test 2
mf2 = MedianFinder()
mf2.addNum(6)
print(mf2.findMedian()) # 6.0
mf2.addNum(10)
print(mf2.findMedian()) # 8.0
mf2.addNum(2)
print(mf2.findMedian()) # 6.0
mf2.addNum(6)
print(mf2.findMedian()) # 6.0
# Test 3 — single element
mf3 = MedianFinder()
mf3.addNum(-1)
print(mf3.findMedian()) # -1.0
mf3.addNum(-2)
print(mf3.findMedian()) # -1.5
Complexity
- Time:
O(log n)peraddNum,O(1)perfindMedian - Space:
O(n)
Common Pitfalls
Forgetting to negate when pushing to or popping from small. The max-heap is simulated by negating all values. When you push num to small, push -num. When you pop from small, negate the result to get the actual value.
Not re-balancing after the ordering fix. The two-step rebalance matters: first fix the cross-heap ordering (ensure max of small ≤ min of large), then fix the sizes. If you only do one or the other, the invariant can be violated.
Returning an int instead of a float. The problem asks for a double. In Python, (a + b) / 2 returns a float if either is a float, but -small[0] returns an int. Wrap with float() or use / 2.0.