Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Extra Characters in a String

Difficulty: Medium Source: NeetCode

Problem

You are given a 0-indexed string s and a dictionary of words dictionary. You have to break s into one or more non-overlapping substrings such that each substring is present in dictionary. There may be some extra characters in s which are not present in any substring.

Return the minimum number of extra characters left over if you break up s optimally.

Example 1: Input: s = "leetscode", dictionary = ["leet","code","leetcode"] Output: 1

Example 2: Input: s = "sayhelloworld", dictionary = ["hello","world"] Output: 3

Constraints:

  • 1 <= s.length <= 50
  • 1 <= dictionary.length <= 50
  • 1 <= dictionary[i].length <= 50
  • s and dictionary[i] consist of lowercase English letters

Prerequisites

Before attempting this problem, you should be comfortable with:

  • Dynamic Programming (1D) — building up a dp array where each entry depends on previously computed entries
  • Trie (Prefix Tree) — used in the optimal solution to efficiently test all substrings starting at position i
  • String slicing — checking substrings s[i:j] against a set or trie

1. Brute Force (DP + Set)

Intuition

Think of it as: starting at index i, we can either skip s[i] (costing 1 extra character) or consume a substring s[i:j] that exists in the dictionary (costing 0 extra characters). We want the minimum cost. Define dp[i] = minimum extra characters in s[i:]. We fill dp from right to left.

For each position i, we check every possible end j from i+1 to n+1. If s[i:j] is in the dictionary, we can transition to dp[j] with no cost. The base case is dp[n] = 0 (empty suffix has zero extras).

Algorithm

  1. Build a set from dictionary for O(1) lookup.
  2. Create dp array of size n+1, initialize to 0. Set dp[n] = 0.
  3. For i from n-1 down to 0:
    • Option 1: skip s[i], so dp[i] = 1 + dp[i+1].
    • Option 2: for each j from i+1 to n+1, if s[i:j] in dict, consider dp[j].
    • Take the minimum.
  4. Return dp[0].

Solution

def minExtraChar(s: str, dictionary: list[str]) -> int:
    word_set = set(dictionary)
    n = len(s)
    dp = [0] * (n + 1)

    for i in range(n - 1, -1, -1):
        dp[i] = 1 + dp[i + 1]  # skip s[i]
        for j in range(i + 1, n + 1):
            if s[i:j] in word_set:
                dp[i] = min(dp[i], dp[j])

    return dp[0]


print(minExtraChar("leetscode", ["leet", "code", "leetcode"]))  # 1
print(minExtraChar("sayhelloworld", ["hello", "world"]))         # 3
print(minExtraChar("abc", ["a", "b"]))                           # 1

Complexity

  • Time: O(n³) — O(n²) state/transition pairs, each doing O(n) string slice + hash
  • Space: O(n + W) for dp array and word set

2. Trie + DP

Intuition

The bottleneck in the brute force is that for each starting index i, we’re re-hashing substrings s[i:j] for every j. A trie lets us walk all possible substrings starting at i in a single sweep — we just follow the trie one character at a time and check is_end at each step. This avoids repeated string slicing and hashing.

Algorithm

  1. Build a trie from all words in dictionary.
  2. Fill dp from right to left exactly as before.
  3. For position i, walk the trie starting at root, following s[i], s[i+1], ... until a character isn’t found. At each position j where trie_node.is_end is True, update dp[i] = min(dp[i], dp[j]).

Solution

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False


def minExtraChar(s: str, dictionary: list[str]) -> int:
    # Build trie
    root = TrieNode()
    for word in dictionary:
        node = root
        for ch in word:
            if ch not in node.children:
                node.children[ch] = TrieNode()
            node = node.children[ch]
        node.is_end = True

    n = len(s)
    dp = [0] * (n + 1)

    for i in range(n - 1, -1, -1):
        dp[i] = 1 + dp[i + 1]  # skip s[i]
        node = root
        for j in range(i, n):
            ch = s[j]
            if ch not in node.children:
                break
            node = node.children[ch]
            if node.is_end:
                dp[i] = min(dp[i], dp[j + 1])

    return dp[0]


print(minExtraChar("leetscode", ["leet", "code", "leetcode"]))  # 1
print(minExtraChar("sayhelloworld", ["hello", "world"]))         # 3
print(minExtraChar("abc", ["a", "b"]))                           # 1
print(minExtraChar("abcd", ["ab", "bc", "cd"]))                  # 0

Complexity

  • Time: O(n² + W * L) — O(n²) for dp transitions, O(W * L) to build the trie
  • Space: O(W * L + n) for the trie and dp array

Common Pitfalls

Off-by-one in the dp indices. dp[i] represents the minimum extras in s[i:], so dp[n] = 0 (empty string). When a substring s[i:j] matches, you transition to dp[j] (not dp[j-1]).

Breaking too early in the trie walk. When a character isn’t in the trie node’s children, you should break out of the inner loop — there’s no point continuing since the trie can’t extend that path. But do collect is_end results before breaking.

Forgetting to initialize skip cost. The line dp[i] = 1 + dp[i+1] must run before checking dictionary matches — it sets the “worst case” that matches can only improve upon.