Rebuild Cluster Step by Step
This guide takes you from clean machines to a healthy base cluster.
By the end of this phase, you will have:
- a working four-node WireGuard mesh
- a K3s control plane on
ms-1 - three joined worker nodes
- Calico networking
- the placement labels and taints that the rest of the platform depends on
This is the foundation for everything that comes later. Do not continue to ingress, TLS, Argo CD, PostgreSQL, or Keycloak until this phase is healthy.
Before You Start
Make sure these assumptions are true:
- you can SSH to
ms-1,wk-1,wk-2, andvm-1 - all four machines run Ubuntu 24.04
- you have
sudoor root access on every node - your home router can forward UDP ports for WireGuard
vm-1is the SSH name of the public cloud node that will appear in Kubernetes asctb-edge-1
Use these reference addresses throughout the build:
| Node | Purpose | LAN IP | WireGuard IP |
|---|---|---|---|
ms-1 | K3s server | 192.168.15.2 | 172.27.15.12 |
wk-1 | worker | 192.168.15.3 | 172.27.15.11 |
wk-2 | worker | 192.168.15.4 | 172.27.15.13 |
vm-1 / ctb-edge-1 | public edge worker | n/a | 172.27.15.31 |
Step 0: Confirm The Machines Are Safe To Use
On each node, run:
hostname -f
uname -a
ip -br addr
You are checking three things:
- you are on the machine you think you are on
- the network interfaces look normal
- SSH connectivity is stable before you begin making changes
If you are rebuilding on reused machines and suspect old Kubernetes, CNI, WireGuard, or firewall leftovers, stop here and use the destructive cleanup guide in 03-safety-checks-and-cleaning.md.
Step 1: Build The WireGuard Mesh
WireGuard is the first real dependency of the cluster. K3s will use the WireGuard IPs as node IPs, so do not move forward until this network is working cleanly.
1. Install WireGuard on every node
Run on ms-1, wk-1, wk-2, and vm-1:
sudo apt-get update
sudo apt-get install -y wireguard wireguard-tools
2. Generate a key pair on every node
Run on each node:
sudo install -d -m 700 /etc/wireguard
sudo sh -c 'umask 077 && wg genkey | tee /etc/wireguard/wg0.key | wg pubkey > /etc/wireguard/wg0.pub'
sudo cat /etc/wireguard/wg0.pub
Collect the four public keys before continuing. You will paste them into the matching peer entries below.
3. Configure the home router
The home router must forward these UDP ports from the home public IP to the home nodes:
203.0.113.10:51820/udp -> wk-1:51820/udp203.0.113.10:51821/udp -> ms-1:51820/udp203.0.113.10:51822/udp -> wk-2:51820/udp
The cloud edge node connects back into the home network through those forwarded ports.
4. Apply the WireGuard sysctl setting
On each node, create the sysctl file:
cat <<'EOF' | sudo tee /etc/sysctl.d/99-wireguard.conf >/dev/null
net.ipv4.conf.all.rp_filter=2
net.ipv4.conf.default.rp_filter=2
EOF
sudo sysctl --system
5. Create wg0.conf on each node
Replace every placeholder with the real private key or peer public key you generated.
For the PrivateKey field, paste the actual contents of /etc/wireguard/wg0.key on that node.
On ms-1:
[Interface]
Address = 172.27.15.12/32
ListenPort = 51820
PrivateKey = <MS_1_PRIVATE_KEY>
MTU = 1420
SaveConfig = false
[Peer]
# vm-1 / ctb-edge-1
PublicKey = <VM_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.31/32
Endpoint = 198.51.100.25:51820
PersistentKeepalive = 25
[Peer]
# wk-1
PublicKey = <WK_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.11/32
Endpoint = 192.168.15.3:51820
[Peer]
# wk-2
PublicKey = <WK_2_PUBLIC_KEY>
AllowedIPs = 172.27.15.13/32
Endpoint = 192.168.15.4:51820
On wk-1:
[Interface]
Address = 172.27.15.11/32
ListenPort = 51820
PrivateKey = <WK_1_PRIVATE_KEY>
MTU = 1420
SaveConfig = false
[Peer]
# vm-1 / ctb-edge-1
PublicKey = <VM_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.31/32
Endpoint = 198.51.100.25:51820
PersistentKeepalive = 25
[Peer]
# ms-1
PublicKey = <MS_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.12/32
Endpoint = 192.168.15.2:51820
[Peer]
# wk-2
PublicKey = <WK_2_PUBLIC_KEY>
AllowedIPs = 172.27.15.13/32
Endpoint = 192.168.15.4:51820
On wk-2:
[Interface]
Address = 172.27.15.13/32
ListenPort = 51820
PrivateKey = <WK_2_PRIVATE_KEY>
MTU = 1420
SaveConfig = false
[Peer]
# vm-1 / ctb-edge-1
PublicKey = <VM_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.31/32
Endpoint = 198.51.100.25:51820
PersistentKeepalive = 25
[Peer]
# ms-1
PublicKey = <MS_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.12/32
Endpoint = 192.168.15.2:51820
[Peer]
# wk-1
PublicKey = <WK_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.11/32
Endpoint = 192.168.15.3:51820
On vm-1:
[Interface]
Address = 172.27.15.31/32
ListenPort = 51820
PrivateKey = <VM_1_PRIVATE_KEY>
MTU = 1420
SaveConfig = false
[Peer]
# wk-1 via home router forward 51820
PublicKey = <WK_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.11/32
Endpoint = 203.0.113.10:51820
[Peer]
# ms-1 via home router forward 51821
PublicKey = <MS_1_PUBLIC_KEY>
AllowedIPs = 172.27.15.12/32
Endpoint = 203.0.113.10:51821
[Peer]
# wk-2 via home router forward 51822
PublicKey = <WK_2_PUBLIC_KEY>
AllowedIPs = 172.27.15.13/32
Endpoint = 203.0.113.10:51822
Optional: add an admin peer. If you want to access the cluster from a laptop or workstation over WireGuard (for example, to run kubectl remotely), you can add an extra [Peer] block to each node’s config. The full templates in k8s-cluster/bootstrap/wireguard/ include an example admin peer at 172.27.15.50/32 with PostUp/PostDown route commands. This is not required for the cluster to function, but it is useful for remote administration without SSH jump hosts.
Save each of those as /etc/wireguard/wg0.conf, then lock down the permissions:
sudo chmod 600 /etc/wireguard/wg0.conf /etc/wireguard/wg0.key
6. Enable WireGuard
Run on each node:
sudo systemctl enable --now wg-quick@wg0
sudo systemctl status wg-quick@wg0 --no-pager
7. Verify the mesh before you continue
Run on every node:
wg show
ping -c 3 172.27.15.12
ping -c 3 172.27.15.11
ping -c 3 172.27.15.13
ping -c 3 172.27.15.31
Good looks like:
- every node shows peer handshakes in
wg show - every node can ping the other three WireGuard IPs
- no node is falling back to public-IP-based cluster communication
If the mesh is not healthy, fix WireGuard now. Kubernetes will be unreliable if you continue with a half-working private network.
Step 2: Prepare the K3s DNS Resolver File
K3s will use a dedicated resolver path. Run this on every node:
for host in ms-1 wk-1 wk-2 vm-1; do
ssh "$host" 'bash -s' < k8s-cluster/bootstrap/k3s/create-k3s-resolv-conf.sh
done
Verify on one or two nodes:
ssh ms-1 'readlink -f /etc/rancher/k3s/k3s-resolv.conf'
ssh wk-1 'grep -E "^(nameserver|search|options)" /etc/rancher/k3s/k3s-resolv.conf'
Expected result:
/etc/rancher/k3s/k3s-resolv.confpoints to/run/systemd/resolve/resolv.conf
Step 3: Install the K3s Server on ms-1
The repository includes a helper script so you do not have to retype the full install flags. It installs K3s with:
- version
v1.35.1+k3s1 172.27.15.12as node IP and advertise address- flannel disabled
- built-in network policy disabled
- built-in Traefik disabled
- ServiceLB disabled
- pod CIDR
10.42.0.0/16 - service CIDR
10.43.0.0/16
Run:
ssh ms-1 'bash -s' < k8s-cluster/bootstrap/k3s/install-server-ms-1.sh
Verify:
ssh ms-1 'sudo kubectl get nodes -o wide'
ssh ms-1 'sudo systemctl status k3s --no-pager'
At this moment, only ms-1 should appear. It may still show NotReady because the CNI is not installed yet.
Step 4: Install Calico on ms-1
Calico replaces flannel in this design and provides the pod network.
Run:
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; bash -s' < k8s-cluster/bootstrap/k3s/install-calico.sh
This install path is intentionally safe to rerun. It uses server-side apply, waits for the required CRDs, waits for the Tigera operator, and only then applies the Calico custom resources.
Verify:
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; kubectl get pods -n tigera-operator'
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; kubectl get pods -n calico-system'
Step 5: Join the Three Agents
First, read the cluster join token from ms-1:
K3S_TOKEN="$(ssh ms-1 'sudo cat /var/lib/rancher/k3s/server/node-token')"
echo "$K3S_TOKEN"
Then join wk-1:
ssh wk-1 "export K3S_TOKEN='$K3S_TOKEN'; bash -s" < k8s-cluster/bootstrap/k3s/install-agent-wk-1.sh
Join wk-2:
ssh wk-2 "export K3S_TOKEN='$K3S_TOKEN'; bash -s" < k8s-cluster/bootstrap/k3s/install-agent-wk-2.sh
Join the public edge node:
ssh vm-1 "export K3S_TOKEN='$K3S_TOKEN'; bash -s" < k8s-cluster/bootstrap/k3s/install-agent-vm-1.sh
Verify from ms-1:
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; kubectl get nodes -o wide'
Expected internal IPs:
ms-1 -> 172.27.15.12wk-1 -> 172.27.15.11wk-2 -> 172.27.15.13ctb-edge-1 -> 172.27.15.31
If a node registers with the wrong IP, stop and fix that before moving on. The cluster should use WireGuard IPs internally.
Step 6: Apply Node Labels And Taints
The rest of the platform depends on predictable placement. Apply the baseline labels and taints now:
ssh ms-1 "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; EDGE_NODE=ctb-edge-1 bash -s" < k8s-cluster/bootstrap/k3s/apply-node-placement.sh
This sets the platform up like this:
ms-1: control-plane taint and server role labelwk-1: worker role labelwk-2: worker role labelctb-edge-1: edge role label pluskakde.eu/edge=true:NoSchedule
Verify:
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; kubectl get nodes --show-labels'
ssh ms-1 'export KUBECONFIG=/etc/rancher/k3s/k3s.yaml; kubectl describe node ctb-edge-1 | rg -n "Taints|Labels" -A6'
Step 7: Run a Base Cluster Health Check
Run on ms-1:
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
kubectl get nodes -o wide
kubectl get pods -n tigera-operator
kubectl get pods -n calico-system
kubectl get ns
Your base cluster is ready when:
- all four nodes are
Ready - Calico pods are healthy
- the Tigera operator is healthy
- node internal IPs match the WireGuard addresses
- the edge node is labeled and tainted correctly
What You Have Now
At this point you have a private Kubernetes foundation that the rest of the homelab can trust.
You do not have public ingress yet. You do not have TLS yet. You do not have GitOps yet. You do not have PostgreSQL or Keycloak yet.
That is exactly right. Those layers come next.
Next Step
Continue with 06. Platform Services Step by Step.