Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Word Break II

Difficulty: Hard Source: NeetCode

Problem

Given a string s and a dictionary of strings wordDict, add spaces in s to construct a sentence where each word is a valid dictionary word. Return all such possible sentences in any order.

Note: The same word in the dictionary may be reused multiple times in the segmentation.

Example 1: Input: s = "catsanddog", wordDict = ["cat","cats","and","sand","dog"] Output: ["cats and dog","cat sand dog"]

Example 2: Input: s = "pineapplepenapple", wordDict = ["apple","pen","applepen","pine","pineapple"] Output: ["pine apple pen apple","pineapple pen apple","pine applepen apple"]

Example 3: Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"] Output: []

Constraints:

  • 1 <= s.length <= 20
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 10
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Prerequisites

Before attempting this problem, you should be comfortable with:

  • Word Break I — the decision version (can the string be broken?); this problem finds all ways
  • Backtracking — at each position, try all words that match the current prefix
  • Memoization — cache results for each starting index to avoid redundant work

1. Brute Force (Backtracking without Memoization)

Intuition

At each starting position start, look at every word in the dictionary. If the substring s[start:start+len(word)] matches the word, add it to the current path and recurse with the new starting position start + len(word). When start reaches the end of s, we’ve built a valid sentence — record it. This explores all possibilities but may redo work for the same starting position many times.

Algorithm

  1. Convert wordDict to a set for O(1) lookup.
  2. Define backtrack(start, current_words).
  3. If start == len(s): append " ".join(current_words) to results and return.
  4. For each word in wordDict:
    • If s[start:start+len(word)] == word:
      • Append word, recurse with start + len(word), pop.

Solution

def wordBreak_brute(s, wordDict):
    word_set = set(wordDict)
    result = []

    def backtrack(start, current_words):
        if start == len(s):
            result.append(" ".join(current_words))
            return
        for word in word_set:
            end = start + len(word)
            if s[start:end] == word:
                current_words.append(word)
                backtrack(end, current_words)
                current_words.pop()

    backtrack(0, [])
    return result


print(wordBreak_brute("catsanddog", ["cat","cats","and","sand","dog"]))
# ['cat sand dog', 'cats and dog'] (order may vary)
print(wordBreak_brute("catsandog", ["cats","dog","sand","and","cat"]))
# []

Complexity

  • Time: O(n * 2^n) — at each position, we may branch on multiple words; exponential in worst case
  • Space: O(n) — recursion depth

2. Backtracking with Memoization

Intuition

The brute force may call backtrack(start, ...) many times for the same start index. The set of valid sentences starting at start depends only on start, not on how we got there. So we cache memo[start] = list of suffixes (word sequences) that can be formed from s[start:]. When we revisit a start, we return the cached result and combine it with the current word without re-exploring.

This is top-down DP over starting positions combined with backtracking to generate all paths.

Algorithm

  1. Define backtrack(start) that returns all sentence fragments for s[start:].
  2. Base case: start == len(s) → return [""].
  3. If start in memo: return memo[start].
  4. For each word that matches s[start:start+len(word)]:
    • Recurse on start + len(word) to get suffixes.
    • Prepend word to each suffix (with a space separator).
  5. Cache and return results.

Solution

def wordBreak(s, wordDict):
    word_set = set(wordDict)
    memo = {}

    def backtrack(start):
        if start in memo:
            return memo[start]
        if start == len(s):
            return [""]  # empty string — base case sentinel

        sentences = []
        for word in word_set:
            end = start + len(word)
            if s[start:end] == word:
                suffixes = backtrack(end)
                for suffix in suffixes:
                    if suffix:
                        sentences.append(word + " " + suffix)
                    else:
                        sentences.append(word)

        memo[start] = sentences
        return sentences

    return backtrack(0)


print(wordBreak("catsanddog", ["cat","cats","and","sand","dog"]))
# ['cat sand dog', 'cats and dog']
print(wordBreak("pineapplepenapple", ["apple","pen","applepen","pine","pineapple"]))
# ['pine apple pen apple', 'pineapple pen apple', 'pine applepen apple']
print(wordBreak("catsandog", ["cats","dog","sand","and","cat"]))
# []
print(wordBreak("aaaa", ["a", "aa", "aaa"]))
# multiple segmentations

Complexity

  • Time: O(n^2 * W) — at most n starting positions, each checking at most W words of total length up to n
  • Space: O(n * 2^n) — in the worst case the memo stores exponentially many sentences per position

Common Pitfalls

Returning [] vs. [""] at the base case. The base case start == len(s) should return [""] (a list with one empty string), not []. Returning [] means “no valid sentences from this position” which is wrong — we’ve successfully matched the entire string. The empty string is the sentinel that signals a complete match.

Building sentences wrong when suffix is empty. At the base case, suffix is "". If you do word + " " + suffix you get "word " with a trailing space. Always check if suffix before adding the space separator, or use " ".join([word, suffix]).strip().

Not using a set for wordDict. Iterating over the list to check each word is O(n*W) per call. Converting to a set and checking each word by prefix (s[start:start+len(word)] == word) is fine since we iterate the set — but a Trie would be even faster for long strings with large dictionaries.

Memoizing on (start, tuple(current_words)). The memo key should be start only. The sentences starting from start are always the same regardless of how we arrived at start. Including current_words in the key defeats the purpose of memoization.