Regular Expression Matching
Difficulty: Hard Source: NeetCode
Problem
Given an input string
sand a patternp, implement regular expression matching with support for.and*:
.matches any single character.*matches zero or more of the preceding element. The match must cover the entire string, not just a substring.Example 1: Input:
s = "aa",p = "a"Output:falseExample 2: Input:
s = "aa",p = "a*"Output:trueExample 3: Input:
s = "ab",p = ".*"Output:trueConstraints:
1 <= s.length <= 201 <= p.length <= 30scontains only lowercase letters.pcontains lowercase letters,., and*. It is guaranteed*is never the first character.
Prerequisites
Before attempting this problem, you should be comfortable with:
- String matching and recursion — comparing characters with branching logic
- 2D DP on two strings — same table structure as edit distance
1. Brute Force / Recursive
Intuition
Process the pattern one “token” at a time. A token is either a plain character/dot, or a char* pair. If the current pattern character is followed by *, we can choose to use it zero times (skip the char*) or one or more times (consume a matching character from s and stay at the same pattern position). Otherwise, match the current characters directly and recurse on the rest.
Algorithm
- Define
dfs(i, j)= doess[i:]matchp[j:]? - Base:
j == len(p)→ returni == len(s). - Check if current
p[j]matchess[i]:first_match = i < len(s) and p[j] in {s[i], '.'}. - If
j + 1 < len(p)andp[j+1] == '*':- Zero uses:
dfs(i, j+2). - One or more:
first_match and dfs(i+1, j). - Return OR of these.
- Zero uses:
- Else: return
first_match and dfs(i+1, j+1).
Solution
def isMatch(s, p):
def dfs(i, j):
if j == len(p):
return i == len(s)
first_match = i < len(s) and p[j] in (s[i], '.')
if j + 1 < len(p) and p[j + 1] == '*':
# zero uses OR one-or-more uses
return dfs(i, j + 2) or (first_match and dfs(i + 1, j))
return first_match and dfs(i + 1, j + 1)
return dfs(0, 0)
print(isMatch("aa", "a")) # False
print(isMatch("aa", "a*")) # True
print(isMatch("ab", ".*")) # True
print(isMatch("aab", "c*a*b")) # True
print(isMatch("mississippi", "mis*is*p*.")) # False
Complexity
- Time:
O(2^(m+n))— exponential without memoization - Space:
O(m+n)— recursion depth
2. Bottom-Up DP (Tabulation)
Intuition
Build a (m+1) x (n+1) boolean table where dp[i][j] = does s[:i] match p[:j]? Base case: dp[0][0] = True (empty string matches empty pattern). For dp[0][j], only patterns like a*b*c* can match empty strings.
The transitions mirror the recursion exactly:
- If
p[j-1] == '*':dp[i][j] = dp[i][j-2](zero uses) ordp[i-1][j]ifp[j-2]matchess[i-1](extend a match). - Else if
p[j-1]matchess[i-1](direct char or.):dp[i][j] = dp[i-1][j-1].
Algorithm
- Create
(m+1) x (n+1)dp table,dp[0][0] = True. - Fill row 0 for patterns that match empty strings (
a*,a*b*, etc.). - Fill the rest using the transition above.
- Return
dp[m][n].
Solution
def isMatch(s, p):
m, n = len(s), len(p)
dp = [[False] * (n + 1) for _ in range(m + 1)]
dp[0][0] = True
# Patterns that match empty string: a*, a*b*, a*b*c*, ...
for j in range(2, n + 1):
if p[j - 1] == '*':
dp[0][j] = dp[0][j - 2]
for i in range(1, m + 1):
for j in range(1, n + 1):
if p[j - 1] == '*':
# zero occurrences of preceding char
dp[i][j] = dp[i][j - 2]
# one or more occurrences if preceding char matches s[i-1]
if p[j - 2] == s[i - 1] or p[j - 2] == '.':
dp[i][j] = dp[i][j] or dp[i - 1][j]
elif p[j - 1] == s[i - 1] or p[j - 1] == '.':
dp[i][j] = dp[i - 1][j - 1]
return dp[m][n]
print(isMatch("aa", "a")) # False
print(isMatch("aa", "a*")) # True
print(isMatch("ab", ".*")) # True
print(isMatch("aab", "c*a*b")) # True
print(isMatch("mississippi", "mis*is*p*.")) # False
print(isMatch("", "a*")) # True
print(isMatch("", "")) # True
Complexity
- Time:
O(m * n) - Space:
O(m * n)
Common Pitfalls
Treating * as its own character. The * always applies to the character immediately before it (p[j-2] in 1-indexed j). Never process * in isolation — always look back one position.
Forgetting the “zero uses” path for *. When p[j-1] == '*', even if p[j-2] doesn’t match s[i-1], we can always skip the char* pair entirely (dp[i][j-2]). This case must always be included.
Initialising row 0 incorrectly. Only patterns like a*, a*b*, etc. can match the empty string. For j=1, a lone a cannot match empty. Your base case loop must check for * specifically.