Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Internal Mechanics of Arrays

Course: DSA › Arrays › Introduction

So far, we learned what an array is and how it solves problems where we need to store and manipulate large-scale data easily. We can now look at how the array data structure works under the hood and what makes it so fast and easy to use.


Memory Addresses

Array elements are accessed using indices because arrays are stored contiguously in memory. To understand why, let’s revisit the memory model.

Note: This is how an array data structure is stored at the lowest level. Higher-level programming languages abstract all this from the user, but at their core, use the same mechanism.

Memory in RAM is logically organized as a sequence of blocks, each 1 byte (8 bits) long. Every block has a unique identifier — its address — which is simply its relative position from the start (starting from 0).

---
config:
  theme: base
  themeVariables:
    primaryColor: "#dbeafe"
    primaryBorderColor: "#3b82f6"
    primaryTextColor: "#1e3a5f"
    lineColor: "#64748b"
    secondaryColor: "#ede9fe"
    tertiaryColor: "#fef9c3"
---
flowchart LR
    b0["0<br/>8 bits"] --- b1["1<br/>8 bits"] --- b2["2<br/>8 bits"] --- b3["3<br/>8 bits"] --- b4["4<br/>8 bits"] --- b5["5<br/>8 bits"] --- b6["6<br/>8 bits"] --- b7["7<br/>8 bits"]
    addr(["Address = 3"]) --> b3

Memory is logically organized as a linear sequence of blocks.


Layout in Memory

An array is just a continuous segment of memory that stores data of a single type. Each element in the array has a fixed size equal to the size of its data type. So:

Total size of array = size of datatype × number of elements

The address of the memory block where an array starts is called the array’s base address.

Base address: The address of the block of memory where an array starts. The base address, along with the index, is used to access data items in an array.

Here’s what an array of 5 integers looks like in memory, with a base address of 2 and each int occupying 4 bytes:

---
config:
  theme: base
  themeVariables:
    primaryColor: "#dbeafe"
    primaryBorderColor: "#3b82f6"
    primaryTextColor: "#1e3a5f"
    lineColor: "#64748b"
    secondaryColor: "#ede9fe"
    tertiaryColor: "#fef9c3"
---
flowchart LR
    base(["base address = 2"]) --> e0

    e0["value1<br/>index: 0<br/>addr: 2 → 5"]
    e1["value2<br/>index: 1<br/>addr: 6 → 9"]
    e2["value3<br/>index: 2<br/>addr: 10 → 13"]
    e3["value4<br/>index: 3<br/>addr: 14 → 17"]
    e4["value5<br/>index: 4<br/>addr: 18 → 21"]

    e0 --- e1 --- e2 --- e3 --- e4

Structure of an array in memory. Each element spans 4 consecutive bytes (size of int).

Key observations:

  • Elements are laid out back to back with no gaps
  • Each element spans exactly size_of_datatype bytes
  • The next element always starts exactly size_of_datatype bytes after the previous one

Accessing Data Items

Now that we know how an array maps into continuous memory, we can derive a simple formula to calculate the address of any element, given:

  • the base address (where the array starts)
  • the size of the datatype (bytes per element)
  • the index (which element we want)
---
config:
  theme: base
  themeVariables:
    primaryColor: "#dbeafe"
    primaryBorderColor: "#3b82f6"
    primaryTextColor: "#1e3a5f"
    lineColor: "#64748b"
    secondaryColor: "#ede9fe"
    tertiaryColor: "#fef9c3"
---
flowchart LR
    F["address(index)"] --> EQ["="]
    EQ --> B["base_address"]
    B --> PLUS["+"]
    PLUS --> PAREN["( size_of_datatype  ×  index )"]

    style F fill:#ede9fe,stroke:#7c3aed,color:#3b0764
    style B fill:#fef9c3,stroke:#d97706,color:#78350f
    style PAREN fill:#dcfce7,stroke:#16a34a,color:#14532d

Calculating the address of a data item stored at a given index in an array.

Let’s verify with our example (base address = 2, int = 4 bytes):

IndexFormulaAddress
02 + (4 × 0)2
12 + (4 × 1)6
22 + (4 × 2)10
32 + (4 × 3)14
42 + (4 × 4)18

This matches the layout exactly.

This is why array indices start at 0, not 1.

Think of the index not as a position number, but as a how many elements to skip count.

  • To reach the first element — skip 0 elements → index 0
  • To reach the second element — skip 1 element → index 1
  • To reach the third element — skip 2 elements → index 2

The formula makes this concrete. With base address 2 and int size 4:

  • index 0 → 2 + (4 × 0) = 2 ✓ lands exactly at the first element
  • index 1 → 2 + (4 × 1) = 6 ✓ lands exactly at the second element

If indices started at 1 instead, index 1 would give address 6 — jumping straight past the first element at address 2, which would never be reachable. You’d need a messy correction like base + (size × (index − 1)). Starting at 0 keeps the formula clean and direct.

The CPU computes this address instantly using a single multiplication and addition — no iteration, no searching. That’s why array access is O(1).


Key Takeaway

The power of arrays comes from this formula. Once you know the base address and the datatype size, you can jump to any element in constant time. The CPU doesn’t need to scan from the beginning — it does one arithmetic operation and lands exactly at the right memory address.

# Simulating the address formula in Python
base_address = 2
size_of_int = 4  # bytes

def address_of(index: int) -> int:
    return base_address + (size_of_int * index)

for i in range(5):
    print(f"value{i+1} at index {i} → address {address_of(i)}")