Serialize and Deserialize Binary Tree
Difficulty: Hard Source: NeetCode
Problem
Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be reconstructed later in the same or another computer environment. Design an algorithm to serialize and deserialize a binary tree. There is no restriction on how your serialization/deserialization algorithm works, but you just need to ensure that a tree can be serialized to a string and this string can be deserialized to the original tree structure.
Example 1: Input: root = [1,2,3,null,null,4,5] Output: [1,2,3,null,null,4,5]
Example 2: Input: root = [] Output: []
Constraints:
- The number of nodes in the tree is in the range
[0, 10^4]-1000 <= Node.val <= 1000
Prerequisites
Before attempting this problem, you should be comfortable with:
- Preorder Traversal — Root first traversal which makes reconstruction straightforward
- BFS / Level Order — Alternative serialization strategy
- Null Markers — Encoding the absence of nodes to preserve tree structure
1. DFS Preorder Serialization
Intuition
In preorder, we visit root first, then left, then right. When serializing, we write null markers for missing children. This is powerful for deserialization: we read the stream left-to-right in preorder — the first value is always the root, then we recursively build left and right subtrees. A global pointer into the token list advances as we consume values.
Algorithm
Serialize:
- Preorder DFS: write
node.valor"null"for missing nodes - Join with a delimiter (e.g., comma)
Deserialize:
- Split string into token list
- Recursive build: consume first token
- If “null” → return None
- Otherwise → create node, recursively build left then right
Solution
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
class Codec:
def serialize(self, root):
"""Encodes a tree to a single string."""
tokens = []
def dfs(node):
if not node:
tokens.append("null")
return
tokens.append(str(node.val))
dfs(node.left)
dfs(node.right)
dfs(root)
return ",".join(tokens)
def deserialize(self, data):
"""Decodes your encoded data to tree."""
tokens = iter(data.split(","))
def build():
val = next(tokens)
if val == "null":
return None
node = TreeNode(int(val))
node.left = build()
node.right = build()
return node
return build()
# --- helpers ---
def build_tree(values):
if not values:
return None
root = TreeNode(values[0])
queue = [root]
i = 1
while queue and i < len(values):
node = queue.pop(0)
if i < len(values) and values[i] is not None:
node.left = TreeNode(values[i])
queue.append(node.left)
i += 1
if i < len(values) and values[i] is not None:
node.right = TreeNode(values[i])
queue.append(node.right)
i += 1
return root
def inorder(root):
if not root:
return []
return inorder(root.left) + [root.val] + inorder(root.right)
# --- tests ---
codec = Codec()
root1 = build_tree([1, 2, 3, None, None, 4, 5])
serialized1 = codec.serialize(root1)
print("Serialized:", serialized1)
# "1,2,null,null,3,4,null,null,5,null,null"
deserialized1 = codec.deserialize(serialized1)
print("Inorder original:", inorder(root1))
print("Inorder after roundtrip:", inorder(deserialized1))
# Both: [2, 1, 4, 3, 5]
root2 = build_tree([])
serialized2 = codec.serialize(root2)
print("Empty tree:", serialized2) # "null"
deserialized2 = codec.deserialize(serialized2)
print("Deserialized empty:", deserialized2) # None
root3 = build_tree([1])
s3 = codec.serialize(root3)
print("Single node:", s3) # "1,null,null"
d3 = codec.deserialize(s3)
print("Single val:", d3.val) # 1
Complexity
- Time:
O(n)— serialize visits each node once; deserialize creates each node once - Space:
O(n)— storing the serialized string and recursion stack
2. BFS Level-Order Serialization
Intuition
BFS produces the familiar level-order format (like LeetCode’s tree representation). Serialization is straightforward BFS. Deserialization rebuilds level by level using a queue of “parent” nodes — each parent claims the next two non-processed tokens as its children.
Algorithm
Serialize: BFS, write each node’s value or “null”
Deserialize:
- Create root from first token
- Use a queue of nodes that need children assigned
- For each node in the queue, read the next two tokens to set left and right children
Solution
from collections import deque
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right
class CodecBFS:
def serialize(self, root):
if not root:
return "null"
tokens = []
queue = deque([root])
while queue:
node = queue.popleft()
if node:
tokens.append(str(node.val))
queue.append(node.left)
queue.append(node.right)
else:
tokens.append("null")
return ",".join(tokens)
def deserialize(self, data):
if data == "null":
return None
tokens = data.split(",")
root = TreeNode(int(tokens[0]))
queue = deque([root])
i = 1
while queue and i < len(tokens):
node = queue.popleft()
if i < len(tokens) and tokens[i] != "null":
node.left = TreeNode(int(tokens[i]))
queue.append(node.left)
i += 1
if i < len(tokens) and tokens[i] != "null":
node.right = TreeNode(int(tokens[i]))
queue.append(node.right)
i += 1
return root
# --- helpers ---
def build_tree(values):
if not values:
return None
root = TreeNode(values[0])
queue = [root]
i = 1
while queue and i < len(values):
node = queue.pop(0)
if i < len(values) and values[i] is not None:
node.left = TreeNode(values[i])
queue.append(node.left)
i += 1
if i < len(values) and values[i] is not None:
node.right = TreeNode(values[i])
queue.append(node.right)
i += 1
return root
def inorder(root):
if not root:
return []
return inorder(root.left) + [root.val] + inorder(root.right)
# --- tests ---
codec = CodecBFS()
root1 = build_tree([1, 2, 3, None, None, 4, 5])
s1 = codec.serialize(root1)
print("BFS Serialized:", s1)
# "1,2,3,null,null,4,5,null,null,null,null"
d1 = codec.deserialize(s1)
print("Inorder match:", inorder(root1) == inorder(d1)) # True
root2 = build_tree([1, 2])
s2 = codec.serialize(root2)
print("BFS:", s2) # "1,2,null,null,null"
d2 = codec.deserialize(s2)
print("Roundtrip:", inorder(d2)) # [2, 1]
Complexity
- Time:
O(n)— BFS visits each node once - Space:
O(n)— queue holds up to the widest level; serialized string is O(n)
DFS vs BFS Tradeoff
| DFS (Preorder) | BFS (Level Order) | |
|---|---|---|
| Serialized format | Recursive depth-first | Familiar LeetCode format |
| Null markers | Explicit for every missing child | Trailing nulls omittable |
| String length | Usually shorter for sparse trees | Can have many trailing nulls |
| Deserialization | Clean recursive implementation | Iterative, slightly more bookkeeping |
Both approaches are O(n) time and space. The DFS approach tends to produce cleaner code; the BFS approach matches the LeetCode visualization format more closely.
Common Pitfalls
Not encoding null children. Without null markers, you can’t distinguish between [1,2] (root with left child) and [1,null,2] (root with right child). Null markers are essential for encoding structure.
Using whitespace or numbers in node values as delimiters. If node values can be multi-digit or negative (like -100), you need a delimiter that won’t appear in the value itself. A comma works fine; don’t use just a space.
Off-by-one in BFS deserialization. The index i must advance correctly for both the left and right child reads. A common mistake is advancing i only when a child exists, which misaligns the token sequence.