Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Regular Expression Matching

Difficulty: Hard Source: NeetCode

Problem

Given an input string s and a pattern p, implement regular expression matching with support for . and *:

  • . matches any single character.
  • * matches zero or more of the preceding element. The match must cover the entire string, not just a substring.

Example 1: Input: s = "aa", p = "a" Output: false

Example 2: Input: s = "aa", p = "a*" Output: true

Example 3: Input: s = "ab", p = ".*" Output: true

Constraints:

  • 1 <= s.length <= 20
  • 1 <= p.length <= 30
  • s contains only lowercase letters.
  • p contains lowercase letters, ., and *. It is guaranteed * is never the first character.

Prerequisites

Before attempting this problem, you should be comfortable with:

  • String matching and recursion — comparing characters with branching logic
  • 2D DP on two strings — same table structure as edit distance

1. Brute Force / Recursive

Intuition

Process the pattern one “token” at a time. A token is either a plain character/dot, or a char* pair. If the current pattern character is followed by *, we can choose to use it zero times (skip the char*) or one or more times (consume a matching character from s and stay at the same pattern position). Otherwise, match the current characters directly and recurse on the rest.

Algorithm

  1. Define dfs(i, j) = does s[i:] match p[j:]?
  2. Base: j == len(p) → return i == len(s).
  3. Check if current p[j] matches s[i]: first_match = i < len(s) and p[j] in {s[i], '.'}.
  4. If j + 1 < len(p) and p[j+1] == '*':
    • Zero uses: dfs(i, j+2).
    • One or more: first_match and dfs(i+1, j).
    • Return OR of these.
  5. Else: return first_match and dfs(i+1, j+1).

Solution

def isMatch(s, p):
    def dfs(i, j):
        if j == len(p):
            return i == len(s)
        first_match = i < len(s) and p[j] in (s[i], '.')
        if j + 1 < len(p) and p[j + 1] == '*':
            # zero uses OR one-or-more uses
            return dfs(i, j + 2) or (first_match and dfs(i + 1, j))
        return first_match and dfs(i + 1, j + 1)

    return dfs(0, 0)


print(isMatch("aa", "a"))    # False
print(isMatch("aa", "a*"))   # True
print(isMatch("ab", ".*"))   # True
print(isMatch("aab", "c*a*b"))  # True
print(isMatch("mississippi", "mis*is*p*."))  # False

Complexity

  • Time: O(2^(m+n)) — exponential without memoization
  • Space: O(m+n) — recursion depth

2. Bottom-Up DP (Tabulation)

Intuition

Build a (m+1) x (n+1) boolean table where dp[i][j] = does s[:i] match p[:j]? Base case: dp[0][0] = True (empty string matches empty pattern). For dp[0][j], only patterns like a*b*c* can match empty strings.

The transitions mirror the recursion exactly:

  • If p[j-1] == '*': dp[i][j] = dp[i][j-2] (zero uses) or dp[i-1][j] if p[j-2] matches s[i-1] (extend a match).
  • Else if p[j-1] matches s[i-1] (direct char or .): dp[i][j] = dp[i-1][j-1].

Algorithm

  1. Create (m+1) x (n+1) dp table, dp[0][0] = True.
  2. Fill row 0 for patterns that match empty strings (a*, a*b*, etc.).
  3. Fill the rest using the transition above.
  4. Return dp[m][n].

Solution

def isMatch(s, p):
    m, n = len(s), len(p)
    dp = [[False] * (n + 1) for _ in range(m + 1)]
    dp[0][0] = True

    # Patterns that match empty string: a*, a*b*, a*b*c*, ...
    for j in range(2, n + 1):
        if p[j - 1] == '*':
            dp[0][j] = dp[0][j - 2]

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if p[j - 1] == '*':
                # zero occurrences of preceding char
                dp[i][j] = dp[i][j - 2]
                # one or more occurrences if preceding char matches s[i-1]
                if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                    dp[i][j] = dp[i][j] or dp[i - 1][j]
            elif p[j - 1] == s[i - 1] or p[j - 1] == '.':
                dp[i][j] = dp[i - 1][j - 1]

    return dp[m][n]


print(isMatch("aa", "a"))            # False
print(isMatch("aa", "a*"))           # True
print(isMatch("ab", ".*"))           # True
print(isMatch("aab", "c*a*b"))       # True
print(isMatch("mississippi", "mis*is*p*."))  # False
print(isMatch("", "a*"))             # True
print(isMatch("", ""))               # True

Complexity

  • Time: O(m * n)
  • Space: O(m * n)

Common Pitfalls

Treating * as its own character. The * always applies to the character immediately before it (p[j-2] in 1-indexed j). Never process * in isolation — always look back one position.

Forgetting the “zero uses” path for *. When p[j-1] == '*', even if p[j-2] doesn’t match s[i-1], we can always skip the char* pair entirely (dp[i][j-2]). This case must always be included.

Initialising row 0 incorrectly. Only patterns like a*, a*b*, etc. can match the empty string. For j=1, a lone a cannot match empty. Your base case loop must check for * specifically.