Alien Dictionary
Difficulty: Hard Source: NeetCode
Problem
There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.
You are given a list of strings
wordsfrom the alien language’s dictionary, where the strings inwordsare sorted lexicographically by the rules of this new language.Return a string of the unique letters in the new alien language sorted in the order as they appear in the alien language. If there is no solution, return
"". If there are multiple solutions, return any of them.Example 1: Input:
words = ["wrt","wrf","er","ett","rftt"]Output:"wertf"Example 2: Input:
words = ["z","x"]Output:"zx"Example 3: Input:
words = ["z","x","z"]Output:""(invalid — cycle detected)Constraints:
1 <= words.length <= 1001 <= words[i].length <= 100- All characters are lowercase English letters
- The dictionary is supposedly sorted
Prerequisites
Before attempting this problem, you should be comfortable with:
- Topological Sort — ordering nodes in a DAG such that all edges point forward
- Cycle Detection in Directed Graphs — using DFS with color states
- Graph Building from Constraints — inferring edges from pairwise comparisons
1. Brute Force (BFS Topological Sort — Kahn’s Algorithm)
Intuition
Compare adjacent words to discover ordering constraints: the first character where two consecutive words differ tells us which letter comes first in the alien alphabet. Build a directed graph from these constraints, then run a topological sort. If the graph has a cycle, no valid ordering exists.
Algorithm
- Collect all unique characters.
- Compare each pair of adjacent words: find the first differing character, add directed edge
(first[i] → second[i]). - If
words[i]is a prefix ofwords[i-1](and longer), return""— invalid. - Run Kahn’s BFS topological sort (by in-degree).
- If all nodes are in the result, return the order; else return
""(cycle).
Solution
from collections import defaultdict, deque
def alienOrder_kahn(words):
# Collect all unique characters
adj = defaultdict(set)
in_degree = {c: 0 for word in words for c in word}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
# Edge case: w1 is longer and is a prefix of w2 — invalid
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ""
for j in range(min_len):
if w1[j] != w2[j]:
if w2[j] not in adj[w1[j]]:
adj[w1[j]].add(w2[j])
in_degree[w2[j]] += 1
break
# Kahn's BFS topological sort
queue = deque([c for c in in_degree if in_degree[c] == 0])
result = []
while queue:
c = queue.popleft()
result.append(c)
for neighbor in adj[c]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
queue.append(neighbor)
return "".join(result) if len(result) == len(in_degree) else ""
print(alienOrder_kahn(["wrt","wrf","er","ett","rftt"])) # "wertf"
print(alienOrder_kahn(["z","x"])) # "zx"
print(alienOrder_kahn(["z","x","z"])) # ""
print(alienOrder_kahn(["abc","ab"])) # ""
Complexity
- Time:
O(C)where C is total number of characters across all words - Space:
O(U + E)where U is unique characters and E is edges
2. DFS Topological Sort with Cycle Detection
Intuition
Same graph-building step, but use DFS for the topological sort. DFS-based topological sort appends a node to the result after all its descendants are processed (post-order). To detect cycles, track three states: unvisited, currently in stack (being visited), and fully processed.
If we ever reach a node that is “currently in stack”, we found a cycle — return "".
Algorithm
- Build the same directed graph as above.
- For each unvisited node, run DFS.
- Color nodes:
False= in current path (visiting),True= done. - If we visit a node that is in the current path, there’s a cycle.
- Reverse the post-order result for the correct topological order.
Solution
from collections import defaultdict
def alienOrder(words):
adj = defaultdict(set)
chars = {c for word in words for c in word}
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ""
for j in range(min_len):
if w1[j] != w2[j]:
adj[w1[j]].add(w2[j])
break
# visited: None = not seen, False = in current DFS path, True = done
visited = {}
result = []
def dfs(node):
if node in visited:
return visited[node] # True = no cycle, False = cycle found
visited[node] = False # mark as in current path
for neighbor in adj[node]:
if not dfs(neighbor):
return False
visited[node] = True
result.append(node)
return True
for c in chars:
if not dfs(c):
return ""
return "".join(reversed(result))
print(alienOrder(["wrt","wrf","er","ett","rftt"])) # "wertf"
print(alienOrder(["z","x"])) # "zx"
print(alienOrder(["z","x","z"])) # ""
print(alienOrder(["abc","ab"])) # ""
print(alienOrder(["z"])) # "z"
Complexity
- Time:
O(C)— building graph and DFS both visit each character once - Space:
O(U + E)— graph, visited map, result
Common Pitfalls
Comparing the wrong pair of words. Only adjacent words in the sorted list give ordering information. Non-adjacent words don’t directly tell you anything.
Missing the invalid prefix case. If words[i] is longer than words[i+1] and words[i+1] is a prefix of words[i], the input is invalid. For example, ["abc", "ab"] should return "".
Only one edge per adjacent pair. Stop comparing characters after the first difference — subsequent differences are not meaningful ordering constraints.