Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Serialize and Deserialize Binary Tree

Difficulty: Hard Source: NeetCode

Problem

Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be reconstructed later in the same or another computer environment. Design an algorithm to serialize and deserialize a binary tree. There is no restriction on how your serialization/deserialization algorithm works, but you just need to ensure that a tree can be serialized to a string and this string can be deserialized to the original tree structure.

Example 1: Input: root = [1,2,3,null,null,4,5] Output: [1,2,3,null,null,4,5]

Example 2: Input: root = [] Output: []

Constraints:

  • The number of nodes in the tree is in the range [0, 10^4]
  • -1000 <= Node.val <= 1000

Prerequisites

Before attempting this problem, you should be comfortable with:

  • Preorder Traversal — Root first traversal which makes reconstruction straightforward
  • BFS / Level Order — Alternative serialization strategy
  • Null Markers — Encoding the absence of nodes to preserve tree structure

1. DFS Preorder Serialization

Intuition

In preorder, we visit root first, then left, then right. When serializing, we write null markers for missing children. This is powerful for deserialization: we read the stream left-to-right in preorder — the first value is always the root, then we recursively build left and right subtrees. A global pointer into the token list advances as we consume values.

Algorithm

Serialize:

  1. Preorder DFS: write node.val or "null" for missing nodes
  2. Join with a delimiter (e.g., comma)

Deserialize:

  1. Split string into token list
  2. Recursive build: consume first token
    • If “null” → return None
    • Otherwise → create node, recursively build left then right

Solution

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

class Codec:
    def serialize(self, root):
        """Encodes a tree to a single string."""
        tokens = []

        def dfs(node):
            if not node:
                tokens.append("null")
                return
            tokens.append(str(node.val))
            dfs(node.left)
            dfs(node.right)

        dfs(root)
        return ",".join(tokens)

    def deserialize(self, data):
        """Decodes your encoded data to tree."""
        tokens = iter(data.split(","))

        def build():
            val = next(tokens)
            if val == "null":
                return None
            node = TreeNode(int(val))
            node.left = build()
            node.right = build()
            return node

        return build()

# --- helpers ---
def build_tree(values):
    if not values:
        return None
    root = TreeNode(values[0])
    queue = [root]
    i = 1
    while queue and i < len(values):
        node = queue.pop(0)
        if i < len(values) and values[i] is not None:
            node.left = TreeNode(values[i])
            queue.append(node.left)
        i += 1
        if i < len(values) and values[i] is not None:
            node.right = TreeNode(values[i])
            queue.append(node.right)
        i += 1
    return root

def inorder(root):
    if not root:
        return []
    return inorder(root.left) + [root.val] + inorder(root.right)

# --- tests ---
codec = Codec()

root1 = build_tree([1, 2, 3, None, None, 4, 5])
serialized1 = codec.serialize(root1)
print("Serialized:", serialized1)
# "1,2,null,null,3,4,null,null,5,null,null"

deserialized1 = codec.deserialize(serialized1)
print("Inorder original:", inorder(root1))
print("Inorder after roundtrip:", inorder(deserialized1))
# Both: [2, 1, 4, 3, 5]

root2 = build_tree([])
serialized2 = codec.serialize(root2)
print("Empty tree:", serialized2)  # "null"
deserialized2 = codec.deserialize(serialized2)
print("Deserialized empty:", deserialized2)  # None

root3 = build_tree([1])
s3 = codec.serialize(root3)
print("Single node:", s3)  # "1,null,null"
d3 = codec.deserialize(s3)
print("Single val:", d3.val)  # 1

Complexity

  • Time: O(n) — serialize visits each node once; deserialize creates each node once
  • Space: O(n) — storing the serialized string and recursion stack

2. BFS Level-Order Serialization

Intuition

BFS produces the familiar level-order format (like LeetCode’s tree representation). Serialization is straightforward BFS. Deserialization rebuilds level by level using a queue of “parent” nodes — each parent claims the next two non-processed tokens as its children.

Algorithm

Serialize: BFS, write each node’s value or “null”

Deserialize:

  1. Create root from first token
  2. Use a queue of nodes that need children assigned
  3. For each node in the queue, read the next two tokens to set left and right children

Solution

from collections import deque

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

class CodecBFS:
    def serialize(self, root):
        if not root:
            return "null"

        tokens = []
        queue = deque([root])

        while queue:
            node = queue.popleft()
            if node:
                tokens.append(str(node.val))
                queue.append(node.left)
                queue.append(node.right)
            else:
                tokens.append("null")

        return ",".join(tokens)

    def deserialize(self, data):
        if data == "null":
            return None

        tokens = data.split(",")
        root = TreeNode(int(tokens[0]))
        queue = deque([root])
        i = 1

        while queue and i < len(tokens):
            node = queue.popleft()

            if i < len(tokens) and tokens[i] != "null":
                node.left = TreeNode(int(tokens[i]))
                queue.append(node.left)
            i += 1

            if i < len(tokens) and tokens[i] != "null":
                node.right = TreeNode(int(tokens[i]))
                queue.append(node.right)
            i += 1

        return root

# --- helpers ---
def build_tree(values):
    if not values:
        return None
    root = TreeNode(values[0])
    queue = [root]
    i = 1
    while queue and i < len(values):
        node = queue.pop(0)
        if i < len(values) and values[i] is not None:
            node.left = TreeNode(values[i])
            queue.append(node.left)
        i += 1
        if i < len(values) and values[i] is not None:
            node.right = TreeNode(values[i])
            queue.append(node.right)
        i += 1
    return root

def inorder(root):
    if not root:
        return []
    return inorder(root.left) + [root.val] + inorder(root.right)

# --- tests ---
codec = CodecBFS()

root1 = build_tree([1, 2, 3, None, None, 4, 5])
s1 = codec.serialize(root1)
print("BFS Serialized:", s1)
# "1,2,3,null,null,4,5,null,null,null,null"

d1 = codec.deserialize(s1)
print("Inorder match:", inorder(root1) == inorder(d1))  # True

root2 = build_tree([1, 2])
s2 = codec.serialize(root2)
print("BFS:", s2)  # "1,2,null,null,null"
d2 = codec.deserialize(s2)
print("Roundtrip:", inorder(d2))  # [2, 1]

Complexity

  • Time: O(n) — BFS visits each node once
  • Space: O(n) — queue holds up to the widest level; serialized string is O(n)

DFS vs BFS Tradeoff

DFS (Preorder)BFS (Level Order)
Serialized formatRecursive depth-firstFamiliar LeetCode format
Null markersExplicit for every missing childTrailing nulls omittable
String lengthUsually shorter for sparse treesCan have many trailing nulls
DeserializationClean recursive implementationIterative, slightly more bookkeeping

Both approaches are O(n) time and space. The DFS approach tends to produce cleaner code; the BFS approach matches the LeetCode visualization format more closely.


Common Pitfalls

Not encoding null children. Without null markers, you can’t distinguish between [1,2] (root with left child) and [1,null,2] (root with right child). Null markers are essential for encoding structure.

Using whitespace or numbers in node values as delimiters. If node values can be multi-digit or negative (like -100), you need a delimiter that won’t appear in the value itself. A comma works fine; don’t use just a space.

Off-by-one in BFS deserialization. The index i must advance correctly for both the left and right child reads. A common mistake is advancing i only when a child exists, which misaligns the token sequence.