Word Break II
Difficulty: Hard Source: NeetCode
Problem
Given a string
sand a dictionary of stringswordDict, add spaces insto construct a sentence where each word is a valid dictionary word. Return all such possible sentences in any order.Note: The same word in the dictionary may be reused multiple times in the segmentation.
Example 1: Input:
s = "catsanddog",wordDict = ["cat","cats","and","sand","dog"]Output:["cats and dog","cat sand dog"]Example 2: Input:
s = "pineapplepenapple",wordDict = ["apple","pen","applepen","pine","pineapple"]Output:["pine apple pen apple","pineapple pen apple","pine applepen apple"]Example 3: Input:
s = "catsandog",wordDict = ["cats","dog","sand","and","cat"]Output:[]Constraints:
1 <= s.length <= 201 <= wordDict.length <= 10001 <= wordDict[i].length <= 10sandwordDict[i]consist of only lowercase English letters.- All the strings of
wordDictare unique.
Prerequisites
Before attempting this problem, you should be comfortable with:
- Word Break I — the decision version (can the string be broken?); this problem finds all ways
- Backtracking — at each position, try all words that match the current prefix
- Memoization — cache results for each starting index to avoid redundant work
1. Brute Force (Backtracking without Memoization)
Intuition
At each starting position start, look at every word in the dictionary. If the substring s[start:start+len(word)] matches the word, add it to the current path and recurse with the new starting position start + len(word). When start reaches the end of s, we’ve built a valid sentence — record it. This explores all possibilities but may redo work for the same starting position many times.
Algorithm
- Convert
wordDictto a set for O(1) lookup. - Define
backtrack(start, current_words). - If
start == len(s): append" ".join(current_words)to results and return. - For each
wordinwordDict:- If
s[start:start+len(word)] == word:- Append
word, recurse withstart + len(word), pop.
- Append
- If
Solution
def wordBreak_brute(s, wordDict):
word_set = set(wordDict)
result = []
def backtrack(start, current_words):
if start == len(s):
result.append(" ".join(current_words))
return
for word in word_set:
end = start + len(word)
if s[start:end] == word:
current_words.append(word)
backtrack(end, current_words)
current_words.pop()
backtrack(0, [])
return result
print(wordBreak_brute("catsanddog", ["cat","cats","and","sand","dog"]))
# ['cat sand dog', 'cats and dog'] (order may vary)
print(wordBreak_brute("catsandog", ["cats","dog","sand","and","cat"]))
# []
Complexity
- Time:
O(n * 2^n)— at each position, we may branch on multiple words; exponential in worst case - Space:
O(n)— recursion depth
2. Backtracking with Memoization
Intuition
The brute force may call backtrack(start, ...) many times for the same start index. The set of valid sentences starting at start depends only on start, not on how we got there. So we cache memo[start] = list of suffixes (word sequences) that can be formed from s[start:]. When we revisit a start, we return the cached result and combine it with the current word without re-exploring.
This is top-down DP over starting positions combined with backtracking to generate all paths.
Algorithm
- Define
backtrack(start)that returns all sentence fragments fors[start:]. - Base case:
start == len(s)→ return[""]. - If
startinmemo: returnmemo[start]. - For each word that matches
s[start:start+len(word)]:- Recurse on
start + len(word)to get suffixes. - Prepend
wordto each suffix (with a space separator).
- Recurse on
- Cache and return results.
Solution
def wordBreak(s, wordDict):
word_set = set(wordDict)
memo = {}
def backtrack(start):
if start in memo:
return memo[start]
if start == len(s):
return [""] # empty string — base case sentinel
sentences = []
for word in word_set:
end = start + len(word)
if s[start:end] == word:
suffixes = backtrack(end)
for suffix in suffixes:
if suffix:
sentences.append(word + " " + suffix)
else:
sentences.append(word)
memo[start] = sentences
return sentences
return backtrack(0)
print(wordBreak("catsanddog", ["cat","cats","and","sand","dog"]))
# ['cat sand dog', 'cats and dog']
print(wordBreak("pineapplepenapple", ["apple","pen","applepen","pine","pineapple"]))
# ['pine apple pen apple', 'pineapple pen apple', 'pine applepen apple']
print(wordBreak("catsandog", ["cats","dog","sand","and","cat"]))
# []
print(wordBreak("aaaa", ["a", "aa", "aaa"]))
# multiple segmentations
Complexity
- Time:
O(n^2 * W)— at most n starting positions, each checking at most W words of total length up to n - Space:
O(n * 2^n)— in the worst case the memo stores exponentially many sentences per position
Common Pitfalls
Returning [] vs. [""] at the base case. The base case start == len(s) should return [""] (a list with one empty string), not []. Returning [] means “no valid sentences from this position” which is wrong — we’ve successfully matched the entire string. The empty string is the sentinel that signals a complete match.
Building sentences wrong when suffix is empty. At the base case, suffix is "". If you do word + " " + suffix you get "word " with a trailing space. Always check if suffix before adding the space separator, or use " ".join([word, suffix]).strip().
Not using a set for wordDict. Iterating over the list to check each word is O(n*W) per call. Converting to a set and checking each word by prefix (s[start:start+len(word)] == word) is fine since we iterate the set — but a Trie would be even faster for long strings with large dictionaries.
Memoizing on (start, tuple(current_words)). The memo key should be start only. The sentences starting from start are always the same regardless of how we arrived at start. Including current_words in the key defeats the purpose of memoization.