Extra Characters in a String
Difficulty: Medium Source: NeetCode
Problem
You are given a 0-indexed string
sand a dictionary of wordsdictionary. You have to breaksinto one or more non-overlapping substrings such that each substring is present indictionary. There may be some extra characters inswhich are not present in any substring.Return the minimum number of extra characters left over if you break up
soptimally.Example 1: Input:
s = "leetscode",dictionary = ["leet","code","leetcode"]Output:1Example 2: Input:
s = "sayhelloworld",dictionary = ["hello","world"]Output:3Constraints:
1 <= s.length <= 501 <= dictionary.length <= 501 <= dictionary[i].length <= 50sanddictionary[i]consist of lowercase English letters
Prerequisites
Before attempting this problem, you should be comfortable with:
- Dynamic Programming (1D) — building up a dp array where each entry depends on previously computed entries
- Trie (Prefix Tree) — used in the optimal solution to efficiently test all substrings starting at position
i - String slicing — checking substrings
s[i:j]against a set or trie
1. Brute Force (DP + Set)
Intuition
Think of it as: starting at index i, we can either skip s[i] (costing 1 extra character) or consume a substring s[i:j] that exists in the dictionary (costing 0 extra characters). We want the minimum cost. Define dp[i] = minimum extra characters in s[i:]. We fill dp from right to left.
For each position i, we check every possible end j from i+1 to n+1. If s[i:j] is in the dictionary, we can transition to dp[j] with no cost. The base case is dp[n] = 0 (empty suffix has zero extras).
Algorithm
- Build a set from
dictionaryfor O(1) lookup. - Create
dparray of sizen+1, initialize to0. Setdp[n] = 0. - For
ifromn-1down to0:- Option 1: skip
s[i], sodp[i] = 1 + dp[i+1]. - Option 2: for each
jfromi+1ton+1, ifs[i:j]in dict, considerdp[j]. - Take the minimum.
- Option 1: skip
- Return
dp[0].
Solution
def minExtraChar(s: str, dictionary: list[str]) -> int:
word_set = set(dictionary)
n = len(s)
dp = [0] * (n + 1)
for i in range(n - 1, -1, -1):
dp[i] = 1 + dp[i + 1] # skip s[i]
for j in range(i + 1, n + 1):
if s[i:j] in word_set:
dp[i] = min(dp[i], dp[j])
return dp[0]
print(minExtraChar("leetscode", ["leet", "code", "leetcode"])) # 1
print(minExtraChar("sayhelloworld", ["hello", "world"])) # 3
print(minExtraChar("abc", ["a", "b"])) # 1
Complexity
- Time:
O(n³)— O(n²) state/transition pairs, each doing O(n) string slice + hash - Space:
O(n + W)for dp array and word set
2. Trie + DP
Intuition
The bottleneck in the brute force is that for each starting index i, we’re re-hashing substrings s[i:j] for every j. A trie lets us walk all possible substrings starting at i in a single sweep — we just follow the trie one character at a time and check is_end at each step. This avoids repeated string slicing and hashing.
Algorithm
- Build a trie from all words in
dictionary. - Fill
dpfrom right to left exactly as before. - For position
i, walk the trie starting atroot, followings[i], s[i+1], ...until a character isn’t found. At each positionjwheretrie_node.is_endisTrue, updatedp[i] = min(dp[i], dp[j]).
Solution
class TrieNode:
def __init__(self):
self.children = {}
self.is_end = False
def minExtraChar(s: str, dictionary: list[str]) -> int:
# Build trie
root = TrieNode()
for word in dictionary:
node = root
for ch in word:
if ch not in node.children:
node.children[ch] = TrieNode()
node = node.children[ch]
node.is_end = True
n = len(s)
dp = [0] * (n + 1)
for i in range(n - 1, -1, -1):
dp[i] = 1 + dp[i + 1] # skip s[i]
node = root
for j in range(i, n):
ch = s[j]
if ch not in node.children:
break
node = node.children[ch]
if node.is_end:
dp[i] = min(dp[i], dp[j + 1])
return dp[0]
print(minExtraChar("leetscode", ["leet", "code", "leetcode"])) # 1
print(minExtraChar("sayhelloworld", ["hello", "world"])) # 3
print(minExtraChar("abc", ["a", "b"])) # 1
print(minExtraChar("abcd", ["ab", "bc", "cd"])) # 0
Complexity
- Time:
O(n² + W * L)— O(n²) for dp transitions, O(W * L) to build the trie - Space:
O(W * L + n)for the trie and dp array
Common Pitfalls
Off-by-one in the dp indices. dp[i] represents the minimum extras in s[i:], so dp[n] = 0 (empty string). When a substring s[i:j] matches, you transition to dp[j] (not dp[j-1]).
Breaking too early in the trie walk. When a character isn’t in the trie node’s children, you should break out of the inner loop — there’s no point continuing since the trie can’t extend that path. But do collect is_end results before breaking.
Forgetting to initialize skip cost. The line dp[i] = 1 + dp[i+1] must run before checking dictionary matches — it sets the “worst case” that matches can only improve upon.