Find The Duplicate Number
Difficulty: Medium Source: NeetCode
Problem
Given an array of integers
numscontainingn + 1integers where each integer is in the range[1, n]inclusive. There is only one repeated number innums, return this repeated number. You must solve the problem without modifying the arraynumsand using only constant extra space.Example 1: Input: nums = [1,3,4,2,2] Output: 2
Example 2: Input: nums = [3,1,3,4,2] Output: 3
Constraints:
- 1 <= n <= 10^5
- nums.length == n + 1
- 1 <= nums[i] <= n
- There is only one repeated number in nums
Prerequisites
Before attempting this problem, you should be comfortable with:
- Hash Sets — O(1) average-case lookup for tracking seen elements
- Floyd’s Cycle Detection — slow/fast pointers for detecting cycles (see problem 3)
- Index-as-Pointer Thinking — treating array values as “next pointers” to model a linked list
1. Brute Force — Hash Set
Intuition
Walk through the array. Keep a set of numbers we’ve seen so far. The first number we see twice is the duplicate. Simple, but uses O(n) extra space which violates the optimal constraint.
Algorithm
- Initialize an empty set
seen. - For each number
ninnums:- If
nis already inseen, returnn. - Otherwise, add
ntoseen.
- If
Solution
def findDuplicate_brute(nums):
seen = set()
for n in nums:
if n in seen:
return n
seen.add(n)
return -1 # guaranteed to find one
# Test cases
print(findDuplicate_brute([1, 3, 4, 2, 2])) # 2
print(findDuplicate_brute([3, 1, 3, 4, 2])) # 3
print(findDuplicate_brute([1, 1])) # 1
print(findDuplicate_brute([2, 2, 2, 2, 2])) # 2
Complexity
- Time:
O(n)— single pass - Space:
O(n)— hash set stores up to n elements
2. Floyd’s Cycle Detection — O(1) Space (Optimal)
Intuition
This is the clever one. The key insight: model the array as a linked list where index i points to index nums[i]. Since every value is in [1, n] and there are n+1 elements, there must be a cycle — and the entry point of that cycle is the duplicate number.
Let’s trace [1, 3, 4, 2, 2] (indices 0–4):
Index: 0 1 2 3 4
Value: 1 3 4 2 2
Treat as: node at index i has next pointer to index nums[i]
Start at index 0:
0 -> nums[0]=1 -> nums[1]=3 -> nums[3]=2 -> nums[2]=4 -> nums[4]=2 -> nums[2]=4 -> ...
Cycle: 2 -> 4 -> 2 -> 4 -> ...
Entry point of cycle: index 2 (value 2 is the duplicate!)
The algorithm is exactly Floyd’s cycle detection:
- Find where slow and fast pointers meet (inside the cycle).
- Reset one pointer to the start.
- Move both one step at a time until they meet — that meeting point is the cycle entry = the duplicate.
Why does resetting to start and walking at the same speed find the entry? It’s a mathematical property of Floyd’s algorithm: if the distance from start to cycle entry is F, and the cycle length is C, then after slow and fast meet inside the cycle, moving one pointer from start and one from the meeting point at the same speed, they converge at the cycle entry after F more steps.
Algorithm
- Phase 1 — Find meeting point inside cycle:
slow = nums[0],fast = nums[0]- Move
slow = nums[slow]andfast = nums[nums[fast]] - Repeat until
slow == fast
- Phase 2 — Find cycle entry:
- Reset
slow2 = nums[0](start of the “linked list”) - Move
slow = nums[slow]andslow2 = nums[slow2](both one step) - Repeat until
slow == slow2
- Reset
- Return
slow(the cycle entry = duplicate).
Solution
def findDuplicate(nums):
# Phase 1: Find the intersection point inside the cycle
slow = nums[0]
fast = nums[0]
while True:
slow = nums[slow] # one step
fast = nums[nums[fast]] # two steps
if slow == fast:
break
# Phase 2: Find the entry point of the cycle
slow2 = nums[0]
while slow != slow2:
slow = nums[slow]
slow2 = nums[slow2]
return slow # cycle entry = duplicate
# Test cases
print(findDuplicate([1, 3, 4, 2, 2])) # 2
print(findDuplicate([3, 1, 3, 4, 2])) # 3
print(findDuplicate([1, 1])) # 1
# Trace for [1,3,4,2,2]:
# Index: 0 -> 1 -> 3 -> 2 -> 4 -> 2 (cycle!)
# Phase 1:
# slow: 1, 3, 2, 4, 2, 4 ...
# fast: 3, 4, 4, 4, ... (fast loops in 2->4->2 cycle)
# They meet at 4 (or wherever in the cycle)
# Phase 2: slow2 starts at nums[0]=1, advances until meeting slow
# Larger test
import random
n = 10
nums = list(range(1, n+1)) + [random.randint(1, n)] # add one duplicate
random.shuffle(nums)
print(f"nums={nums}, duplicate={findDuplicate(nums)}")
Complexity
- Time:
O(n)— Floyd’s algorithm runs in linear time - Space:
O(1)— only two pointer variables
Common Pitfalls
Starting slow and fast at index 0 vs value nums[0]. Since index 0 is special (it’s our “entry node” to the linked list), we start both pointers at nums[0] — the first node we actually jump to. Don’t start at index 0 itself or you’ll treat 0 as part of the cycle incorrectly.
Modifying the array to mark visited nodes. A common approach uses negative marking or sorting, but the problem explicitly says you cannot modify the array. Floyd’s approach is the clean O(1) space solution that obeys this constraint.
Using the do-while pattern correctly. Phase 1 uses while True: ... if slow == fast: break because slow and fast start at the same place — we must move them at least once before checking equality.
Confusing index and value. At step i, we’re AT index i and we jump TO index nums[i]. The duplicate number is the index value that two different predecessors both point to — that’s the cycle entry.