Find The Duplicate Number

Difficulty: Medium Source: NeetCode

Problem

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive. There is only one repeated number in nums, return this repeated number. You must solve the problem without modifying the array nums and using only constant extra space.

Example 1: Input: nums = [1,3,4,2,2] Output: 2

Example 2: Input: nums = [3,1,3,4,2] Output: 3

Constraints:

1 <= n <= 10^5

nums.length == n + 1

1 <= nums[i] <= n

There is only one repeated number in nums

Prerequisites

Before attempting this problem, you should be comfortable with:

Hash Sets — O(1) average-case lookup for tracking seen elements
Floyd’s Cycle Detection — slow/fast pointers for detecting cycles (see problem 3)
Index-as-Pointer Thinking — treating array values as “next pointers” to model a linked list

1. Brute Force — Hash Set

Intuition

Walk through the array. Keep a set of numbers we’ve seen so far. The first number we see twice is the duplicate. Simple, but uses O(n) extra space which violates the optimal constraint.

Algorithm

Initialize an empty set seen.
For each number n in nums:
- If n is already in seen, return n.
- Otherwise, add n to seen.

Solution

def findDuplicate_brute(nums):
    seen = set()
    for n in nums:
        if n in seen:
            return n
        seen.add(n)
    return -1  # guaranteed to find one

# Test cases
print(findDuplicate_brute([1, 3, 4, 2, 2]))  # 2
print(findDuplicate_brute([3, 1, 3, 4, 2]))  # 3
print(findDuplicate_brute([1, 1]))            # 1
print(findDuplicate_brute([2, 2, 2, 2, 2]))  # 2

Complexity

Time: O(n) — single pass
Space: O(n) — hash set stores up to n elements

2. Floyd’s Cycle Detection — O(1) Space (Optimal)

This is the clever one. The key insight: model the array as a linked list where index i points to index nums[i]. Since every value is in [1, n] and there are n+1 elements, there must be a cycle — and the entry point of that cycle is the duplicate number.

Let’s trace [1, 3, 4, 2, 2] (indices 0–4):

Index:  0  1  2  3  4
Value:  1  3  4  2  2

Treat as: node at index i has next pointer to index nums[i]

Start at index 0:
  0 -> nums[0]=1 -> nums[1]=3 -> nums[3]=2 -> nums[2]=4 -> nums[4]=2 -> nums[2]=4 -> ...
  Cycle: 2 -> 4 -> 2 -> 4 -> ...
  Entry point of cycle: index 2 (value 2 is the duplicate!)

The algorithm is exactly Floyd’s cycle detection:

Find where slow and fast pointers meet (inside the cycle).
Reset one pointer to the start.
Move both one step at a time until they meet — that meeting point is the cycle entry = the duplicate.

Why does resetting to start and walking at the same speed find the entry? It’s a mathematical property of Floyd’s algorithm: if the distance from start to cycle entry is F, and the cycle length is C, then after slow and fast meet inside the cycle, moving one pointer from start and one from the meeting point at the same speed, they converge at the cycle entry after F more steps.

Algorithm

Phase 1 — Find meeting point inside cycle:
- slow = nums[0], fast = nums[0]
- Move slow = nums[slow] and fast = nums[nums[fast]]
- Repeat until slow == fast
Phase 2 — Find cycle entry:
- Reset slow2 = nums[0] (start of the “linked list”)
- Move slow = nums[slow] and slow2 = nums[slow2] (both one step)
- Repeat until slow == slow2
Return slow (the cycle entry = duplicate).

Solution

def findDuplicate(nums):
    # Phase 1: Find the intersection point inside the cycle
    slow = nums[0]
    fast = nums[0]

    while True:
        slow = nums[slow]         # one step
        fast = nums[nums[fast]]   # two steps
        if slow == fast:
            break

    # Phase 2: Find the entry point of the cycle
    slow2 = nums[0]
    while slow != slow2:
        slow = nums[slow]
        slow2 = nums[slow2]

    return slow  # cycle entry = duplicate

# Test cases
print(findDuplicate([1, 3, 4, 2, 2]))  # 2
print(findDuplicate([3, 1, 3, 4, 2]))  # 3
print(findDuplicate([1, 1]))            # 1

# Trace for [1,3,4,2,2]:
# Index: 0 -> 1 -> 3 -> 2 -> 4 -> 2 (cycle!)
# Phase 1:
#   slow: 1, 3, 2, 4, 2, 4  ...
#   fast: 3, 4, 4, 4, ...  (fast loops in 2->4->2 cycle)
# They meet at 4 (or wherever in the cycle)
# Phase 2: slow2 starts at nums[0]=1, advances until meeting slow

# Larger test
import random
n = 10
nums = list(range(1, n+1)) + [random.randint(1, n)]  # add one duplicate
random.shuffle(nums)
print(f"nums={nums}, duplicate={findDuplicate(nums)}")

Complexity

Time: O(n) — Floyd’s algorithm runs in linear time
Space: O(1) — only two pointer variables

Common Pitfalls

Starting slow and fast at index 0 vs value nums[0]. Since index 0 is special (it’s our “entry node” to the linked list), we start both pointers at nums[0] — the first node we actually jump to. Don’t start at index 0 itself or you’ll treat 0 as part of the cycle incorrectly.

Modifying the array to mark visited nodes. A common approach uses negative marking or sorting, but the problem explicitly says you cannot modify the array. Floyd’s approach is the clean O(1) space solution that obeys this constraint.

Using the do-while pattern correctly. Phase 1 uses while True: ... if slow == fast: break because slow and fast start at the same place — we must move them at least once before checking equality.

Confusing index and value. At step i, we’re AT index i and we jump TO index nums[i]. The duplicate number is the index value that two different predecessors both point to — that’s the cycle entry.

Keyboard shortcuts

Note Book