Upgrading a single-node Kubernetes cluster to Zero Downtime Maintenance
Ever since the new, more powerful
octavo
replaced the good old
lexicon
server as the single-node Kubernetes cluster to serve all the local self-hosting needs,
the latter has not found any use. Since it would be a waste to let it sit in a box
unused, it will be setup to join the (for now) single-node cluster that is octavo to
enable Zero Downtime Maintenance.
The PLN
The first step will be to have lexicon join the cluster as a worker node, while
keeping octavo as the control plane. Chances are another Intel NUC will join the
cluster later with an intermediate CPU (Intel Core i5) which will bring the total
size of the cluster to the ideal 3: a control plane and two wokers for an
Active-Passive High Availability setup.
Nodes with more capable CPU and GPU can be setup to run the more CPU/GPU instensive workloads, such as Jellyfin (heavy media transcoding), by using Node Labels and Taints to ensure high-demand tasks stay on the more capable nodes.
All nodes have direct access to a NAS where all files are available, both NUCs have NVMe
SSDs for the operating system and Kubernetes local persistent volumns, and octavo has
an additional 4TB SATA SSD, a capability unlikely to be added to other nodes in the near
future. Keeping Jellyfin running in octavo should let it stay using the media files
out of the 4TB SATA SSD rather than using the NAS, while most of audio-only media can be
replicated on the NVMe SSDs.
To handle the diverse storage landscape, two StorageClasses will be defined, one for
the NVMe SSD and another one for the bigger but slower SATA SSD. For apps that benefit
from faster SSD, nodeAffinity can force them onto octavo, while lexicon is kept
exclusively for overflow or light tasks, with a taint to lexicon so that only
specifically tolerated pods will schedule on it.
Potential expansions
There is a distinct possibility that a second high-perf NUC (with an Intel Core i5 CPU)
may be added later. In that case, lexicon could be dedicated (mostly) only to run the
control panel, leaving the high-perf NUCs run all the workloads. Pod Anti-Affinity
could then be used to ensure that if one of then high-perf nodes goes down, a redundant
copy of critical apps are already running on the second high-perf NUC.
This would then leave lexicon as the third Quorum/Control Plane "witness" node, which
would be useful to avoid "split-brain" issues during network partitions. For a 3-node
Kubernetes cluster to be truly stable (HA), it needs a Quorum (a majority vote) to make
decisions. A "witness" node provides this third vote without needing to be powerful.
Alternative devices considered (and discarded).
There is Raspberry Pi 4 currently available but in has only 2GB or RAM, which is not enough (too tight) becuase the Kubernetes Control Plane typically takes up to 1.8 GB. Instead, an old (2016) retired ASUS Chromebox-M014U with a Core i3-4010U CPU, 4 GB of RAM and 16 GB SSD, already running GalliumOS 3.1, could be enough to run the Control Plane node, but considering the latest release of GalliumOS 3.1 was released on 2019-12-22, it doesn't seem fit for this purpose.
If there is a critical hardware failure on lexicon, the cluster degrades but stays
online, management continues without Fault Tolerance. Applications already running on
worker nodes are unaffected by a control plane failure and will continue to run
normally, but if a worker node also fails while the control plane is degraded (only 1
node left), then Kubernetes cannot "self-heal" by rescheduling those pods to octavo
because the "brain" (API server) is inaccessible.
In such event, replacing a failed hardware "witness" is a standard maintenance task:
- Remove the failed node from the
etcdmember list using
etcdctl member remove <ID>from one of the healthy NUCs. - Delete the dead node object from Kubernetes using
kubectl delete node <chromebox-name> - Bring in a replacement and join it as a new control plane node using
kubeadm join --control-plane
Reinstall lexicon
lexicon
was running Ubuntu Server 22.04 and was upgraded to 24.04 using the do-release-upgrade
tool but, although the upgrade went well, it did not fix an old issue that caused the NIC
to slowly become more and more overloaded and slow:
Installing Ubuntu Server 24.04 went smoothly and without any problems, the NUC booted from the USB stick and secure boot, enabled by default, never presented any problem.
Once the intaller boots, the installation steps are:
- Choose language and keyboard layout.
- Choose Ubuntu Server (default, not (minimized)).
- Checked the option to Search for third-party drivers.
- Networking: DHCP on wired network.
- The
enp89s0interface is the NUC's integrated 2.5Gbps NIC (Intel I226-V).
- The
- Pick a local Ubuntu mirror to install packages from.
- Setup a Custom storage layout as follows
- Select the disk (Samsung SSD 970 EVO PLUS 2TB) to Use As Boot Device.
This automatically creates a 1GB partition for
/boot/efi(formatted asfat32). - Create a 30G partition to mount as
/(formatted asext4). - Create a 30G partition to reverse for a future OS.
- Create a 60G partition to mount as
/var/lib(formatted asxfs). - Create a partition with the remaining space (1.7T) to mount as
/home(formatted asxfs).
- Select the disk (Samsung SSD 970 EVO PLUS 2TB) to Use As Boot Device.
This automatically creates a 1GB partition for
- Confirm partitions & changes.
- Set up a Profile: username (
ponder), hostname (lexicon) and password. - Skip Upgrade to Ubuntu Pro (to be done later).
- Install OpenSSH server and allow password authentication (for now).
- A selection of snap packages is available at this point, none were selected.
- Confirm all previous choices and start to install software.
- Once the installation is complete, remove the UBS stick and hit Enter to reboot.
After the first reboot, the server is setup with the same steps as done for octavo:
-
Tweak OpenSSH server to set the (only)
enp89s0interface (Intel 2.5GB NIC) with the.6addresses with this Netplan configuration:/etc/netplan/50-cloud-init.yaml# Dual static IP on LAN, nothing else. network: version: 2 renderer: networkd ethernets: enp89s0: dhcp4: no dhcp6: no # Ser IP address & subnet mask addresses: [ 10.0.0.6/24, 192.168.0.6/24 ] # Set default gateway routes: - to: default via: 192.168.0.1 # UniFi router gateway # Set DNS name servers nameservers: addresses: [ 77.109.128.2, 213.144.129.20 ] -
Setup Remote Access: Cloudflare Tunnel and Tailscale.
Update to HWE 6.14 kernel
While the Intel 11th Gen hardware does not require the absolute latest kernel, having both nodes on the same major kernel branch (e.g. 6.14) simplifies troubleshooting and ensures CNI and networking features behave identically across the cluster.
Once the system is upgraded to Ubuntu 24.04, switching the kernel is as easy as
apt install linux-generic-hwe-24.04 and it even becomes the new default kernel in GRUB:
# apt install linux-generic-hwe-24.04 -y
# apt install linux-generic-hwe-24.04 -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libdebuginfod-common libdebuginfod1t64 linux-headers-6.14.0-37-generic linux-headers-generic-hwe-24.04
linux-hwe-6.14-headers-6.14.0-37 linux-hwe-6.14-tools-6.14.0-37 linux-image-6.14.0-37-generic
linux-image-generic-hwe-24.04 linux-modules-6.14.0-37-generic linux-modules-extra-6.14.0-37-generic
linux-tools-6.14.0-37-generic
Suggested packages:
linux-hwe-6.14-tools
The following NEW packages will be installed:
libdebuginfod-common libdebuginfod1t64 linux-generic-hwe-24.04 linux-headers-6.14.0-37-generic
linux-headers-generic-hwe-24.04 linux-hwe-6.14-headers-6.14.0-37 linux-hwe-6.14-tools-6.14.0-37
linux-image-6.14.0-37-generic linux-image-generic-hwe-24.04 linux-modules-6.14.0-37-generic
linux-modules-extra-6.14.0-37-generic linux-tools-6.14.0-37-generic
0 upgraded, 12 newly installed, 0 to remove and 6 not upgraded.
Need to get 0 B/199 MB of archives.
After this operation, 320 MB of additional disk space will be used.
Preconfiguring packages ...
Selecting previously unselected package libdebuginfod-common.
(Reading database ... 87562 files and directories currently installed.)
Preparing to unpack .../00-libdebuginfod-common_0.190-1.1ubuntu0.1_all.deb ...
Unpacking libdebuginfod-common (0.190-1.1ubuntu0.1) ...
Selecting previously unselected package libdebuginfod1t64:amd64.
Preparing to unpack .../01-libdebuginfod1t64_0.190-1.1ubuntu0.1_amd64.deb ...
Unpacking libdebuginfod1t64:amd64 (0.190-1.1ubuntu0.1) ...
Selecting previously unselected package linux-modules-6.14.0-37-generic.
Preparing to unpack .../02-linux-modules-6.14.0-37-generic_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-modules-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-image-6.14.0-37-generic.
Preparing to unpack .../03-linux-image-6.14.0-37-generic_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-image-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-modules-extra-6.14.0-37-generic.
Preparing to unpack .../04-linux-modules-extra-6.14.0-37-generic_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-modules-extra-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-image-generic-hwe-24.04.
Preparing to unpack .../05-linux-image-generic-hwe-24.04_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-image-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-hwe-6.14-headers-6.14.0-37.
Preparing to unpack .../06-linux-hwe-6.14-headers-6.14.0-37_6.14.0-37.37~24.04.1_all.deb ...
Unpacking linux-hwe-6.14-headers-6.14.0-37 (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-headers-6.14.0-37-generic.
Preparing to unpack .../07-linux-headers-6.14.0-37-generic_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-headers-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-headers-generic-hwe-24.04.
Preparing to unpack .../08-linux-headers-generic-hwe-24.04_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-headers-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-generic-hwe-24.04.
Preparing to unpack .../09-linux-generic-hwe-24.04_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-hwe-6.14-tools-6.14.0-37.
Preparing to unpack .../10-linux-hwe-6.14-tools-6.14.0-37_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-hwe-6.14-tools-6.14.0-37 (6.14.0-37.37~24.04.1) ...
Selecting previously unselected package linux-tools-6.14.0-37-generic.
Preparing to unpack .../11-linux-tools-6.14.0-37-generic_6.14.0-37.37~24.04.1_amd64.deb ...
Unpacking linux-tools-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Setting up libdebuginfod-common (0.190-1.1ubuntu0.1) ...
Setting up linux-hwe-6.14-headers-6.14.0-37 (6.14.0-37.37~24.04.1) ...
Setting up linux-modules-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Setting up linux-headers-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Setting up libdebuginfod1t64:amd64 (0.190-1.1ubuntu0.1) ...
Setting up linux-image-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
I: /boot/vmlinuz is now a symlink to vmlinuz-6.14.0-37-generic
I: /boot/initrd.img is now a symlink to initrd.img-6.14.0-37-generic
Setting up linux-modules-extra-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Setting up linux-headers-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Setting up linux-hwe-6.14-tools-6.14.0-37 (6.14.0-37.37~24.04.1) ...
Setting up linux-image-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Setting up linux-generic-hwe-24.04 (6.14.0-37.37~24.04.1) ...
Setting up linux-tools-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for libc-bin (2.39-0ubuntu8.6) ...
Processing triggers for linux-image-6.14.0-37-generic (6.14.0-37.37~24.04.1) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-6.14.0-37-generic
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.14.0-37-generic
Found initrd image: /boot/initrd.img-6.14.0-37-generic
Found linux image: /boot/vmlinuz-6.8.0-94-generic
Found initrd image: /boot/initrd.img-6.8.0-94-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
Scanning processes...
Scanning candidates...
Scanning processor microcode...
Scanning linux images...
Pending kernel upgrade!
Running kernel version:
6.8.0-94-generic
Diagnostics:
The currently running kernel version is not the expected kernel version 6.14.0-37-generic.
Restarting the system to load the new kernel will not be handled automatically, so you should consider
rebooting.
The processor microcode seems to be up-to-date.
Restarting services...
Service restarts being deferred:
/etc/needrestart/restart.d/dbus.service
systemctl restart getty@tty1.service
systemctl restart systemd-logind.service
systemctl restart unattended-upgrades.service
systemctl restart wpa_supplicant.service
No containers need to be restarted.
User sessions running outdated binaries:
root @ session #1: sshd[913]
root @ session #4: sshd[1153]
root @ user manager service: systemd[918]
No VM guests are running outdated hypervisor (qemu) binaries on this host.
Check what the new default kernel is to make sure it is the newer one:
# awk -F"'" '/menuentry / && /with Linux/ {print i++ " : " $2}' \
/boot/grub/grub.cfg
0 : Ubuntu, with Linux 6.14.0-37-generic
1 : Ubuntu, with Linux 6.14.0-37-generic (recovery mode)
2 : Ubuntu, with Linux 6.8.0-90-generic
3 : Ubuntu, with Linux 6.8.0-90-generic (recovery mode)
Reboot the server to load the new kernel now.
Kubernetes
Kubernetes on Raspberry Pi 5 (alfred)
showed quite a few new hurdles caused by newer versions of Kubernetes (v1.32.2) and
a few components, but now those have been deprecated, so that following the installation
process from octavo is good enough of a guide.
Storage Requirements are satisfaied in the same way by having similar partitions setup.
Install Helm
(via apt) and then
install Kubernetes
(also via apt); install Kubernetes version
v1.34.3 which is the one running at the moment in the cluster:
# kubectl version --output=yaml
clientVersion:
buildDate: "2025-12-09T15:06:39Z"
compiler: gc
gitCommit: df11db1c0f08fab3c0baee1e5ce6efbf816af7f1
gitTreeState: clean
gitVersion: v1.34.3
goVersion: go1.24.11
major: "1"
minor: "34"
platform: linux/amd64
kustomizeVersion: v5.7.1
The connection to the server localhost:8080 was refused - did you specify the right host or port?
Finally, install container runtime and Kubernetes is ready to be initialized.
Bootstrap with kubeadm
Adding Linux worker nodes
is the next big step towards upgrading the single-node cluster to a multi-node cluster,
so here is where the setup of this node (lexicon) diverges from that of the first node
(octavo).
Join as worker node
Having initialized the cluster originally in octavo with
kubeadm init, the first step is to obtain a fresh token to join the
customer (this token expires in 24 hours):
$ kubeadm token create --print-join-command
kubeadm join 10.0.0.8:6443 \
--token ivpct4.7piqcgw68ng77kn5 \
--discovery-token-ca-cert-hash \
sha256:18d968e92516e1a2808166d90a7d7c8b6f7b37cbac6328c49793863f9ae2b982
Then run that command on the new node:
# kubeadm join 10.0.0.8:6443 \
--token a0vi2t.rhxs1cc2rkicpeu0 \
--discovery-token-ca-cert-hash \
sha256:18d968e92516e1a2808166d90a7d7c8b6f7b37cbac6328c49793863f9ae2b982
[preflight] Running pre-flight checks
[preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
[preflight] Use 'kubeadm init phase upload-config kubeadm --config your-config-file' to re-upload it.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/instance-config.yaml"
[patches] Applied patch of type "application/strategic-merge-patch+json" to target "kubeletconfiguration"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s
[kubelet-check] The kubelet is healthy after 504.313399ms
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
After a few seconds the new node is ready:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lexicon Ready <none> 42s v1.34.3
octavo Ready control-plane 281d v1.34.3
Temporary cordon
Even if the new node is ready to run pods, it is not yet ready to satisfy all their
requirements yet; even though all the hostPath volumes have been migrated to
Longhorn, they are not yet
replicated to the new node.
To avoid having pods scheduled in the new node before everything is ready, temporarily cordon it:
$ kubectl cordon lexicon
node/lexicon cordoned
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
lexicon Ready,SchedulingDisabled <none> 19h v1.34.3 192.168.0.6 <none> Ubuntu 24.04.3 LTS 6.14.0-37-generic containerd://2.2.1
octavo Ready control-plane 282d v1.34.3 192.168.0.8 <none> Ubuntu 24.04.3 LTS 6.14.0-37-generic containerd://2.2.1
Setup kubectl access
To run kubectl as a non-root user, copy the Kubernetes config file under the
~/.kube directory from the currento node to the new one:
And with that all the kubectl commands work:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lexicon Ready <none> 4m53s v1.34.3
octavo Ready control-plane 281d v1.34.3
$ kubectl cluster-info
Kubernetes control plane is running at https://10.0.0.8:6443
CoreDNS is running at https://10.0.0.8:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Default pod distribution
After joining the cluster as a worker node, infrastructure components will behave
differently depending on how they were deployed (DaemonSets vs. Deployments).
Resources deployed as a DaemonSet will immediately spawn a pod on the new node, so
these are the components that will run on lexicon automatically:
- Flannel:
kube-flannelpod will start to establish the pod network onlexicon. - MetalLB: The speaker pods will start and
lexiconwill be able to respond to ARP requests forLoadBalancerIPs (once it settles). - Longhorn: the
longhorn-managerandcsi-pluginpods will start and Longhorn will detect the new 2TB SSD if the Node Labeling Job running or after the node is manually labeled. - Intel Device Plugin: this is an operator-managed
DaemonSet, so it will detect the GPU and make it available for hardware transcoding (e.g., in Jellyfin). - Prometheus: a
DaemonSetrunning as part of the Trivy Operator Dashboard in Grafana.
kubectl get daemonsets.apps -A
$ kubectl get daemonsets.apps -A
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
intel-device-plugins-gpu intel-gpu-plugin-gpudeviceplugin 1 1 1 1 1 intel.feature.node.kubernetes.io/gpu=true 277d
kube-flannel kube-flannel-ds 1 1 1 1 1 <none> 281d
kube-system csi-nfs-node 1 1 1 1 1 kubernetes.io/os=linux 8d
kube-system kube-proxy 1 1 1 1 1 kubernetes.io/os=linux 281d
longhorn-system engine-image-ei-ff1cedad 1 1 1 1 1 <none> 2d20h
longhorn-system longhorn-csi-plugin 1 1 1 1 1 <none> 2d20h
longhorn-system longhorn-manager 1 1 1 1 1 <none> 7d5h
metallb-system metallb-speaker 1 1 1 1 1 kubernetes.io/os=linux 44h
monitoring prom-prometheus-node-exporter 1 1 1 1 1 kubernetes.io/os=linux 42d
node-feature-discovery node-feature-discovery-worker 1 1 1 1 1 <none> 277d
Resources deployed as Deployments (single-replica so far) or StatefulSets will stay
running only on on octavo. Most of these will need to be scaled up to 2+ replicas
before a pod is started on lexicon, with a few notable exceptions:
- CoreDNS: Has 2 replicas; Kubernetes will likely move one to
lexiconto balance the load. - Longhorn UI/Controller: These central "brains" stay on
octavo.
kubectl get deployments.apps -A; kubectl get statefulsets.apps -A
$ kubectl get deployments.apps -A
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
audiobookshelf audiobookshelf 1/1 1 1 279d
cert-manager cert-manager 1/1 1 1 281d
cert-manager cert-manager-cainjector 1/1 1 1 281d
cert-manager cert-manager-webhook 1/1 1 1 281d
cert-manager cert-manager-webhook-porkbun 1/1 1 1 15d
code-server code-server 1/1 1 1 276d
default ddns-updater 1/1 1 1 129d
firefly-iii firefly-iii 1/1 1 1 276d
firefly-iii firefly-iii-mysql 1/1 1 1 276d
home-assistant home-assistant 1/1 1 1 280d
homepage homepage 1/1 1 1 31d
intel-device-plugins-gpu inteldeviceplugins-controller-manager 1/1 1 1 277d
komga komga 1/1 1 1 279d
kube-system coredns 2/2 2 2 281d
kube-system csi-nfs-controller 1/1 1 1 8d
kube-system headlamp 1/1 1 1 22h
kube-system snapshot-controller 1/1 1 1 8d
kubernetes-dashboard kubernetes-dashboard-api 1/1 1 1 281d
kubernetes-dashboard kubernetes-dashboard-auth 1/1 1 1 281d
kubernetes-dashboard kubernetes-dashboard-kong 1/1 1 1 281d
kubernetes-dashboard kubernetes-dashboard-metrics-scraper 1/1 1 1 281d
kubernetes-dashboard kubernetes-dashboard-web 1/1 1 1 281d
longhorn-system csi-attacher 3/3 3 3 2d22h
longhorn-system csi-provisioner 3/3 3 3 2d22h
longhorn-system csi-resizer 3/3 3 3 2d22h
longhorn-system csi-snapshotter 3/3 3 3 2d22h
longhorn-system longhorn-driver-deployer 1/1 1 1 7d8h
longhorn-system longhorn-ui 2/2 2 2 7d8h
media-center jellyfin 1/1 1 1 278d
metallb-system metallb-controller 1/1 1 1 46h
monitoring grafana 1/1 1 1 280d
monitoring influxdb 1/1 1 1 280d
monitoring prom-kube-prometheus-stack-operator 1/1 1 1 42d
monitoring prom-kube-state-metrics 1/1 1 1 42d
monitoring trivy-operator-dashboard 1/1 1 1 42d
monitoring version-checker 1/1 1 1 42d
navidrome navidrome 1/1 1 1 279d
node-feature-discovery node-feature-discovery-gc 1/1 1 1 278d
node-feature-discovery node-feature-discovery-master 1/1 1 1 278d
pomerium pomerium 1/1 1 1 45d
ryot postgres 1/1 1 1 227d
ryot ryot 1/1 1 1 227d
tailscale operator 1/1 1 1 281d
trivy-system trivy-operator 1/1 1 1 42d
unifi mongo 1/1 1 1 276d
unifi unifi 1/1 1 1 276d
$ kubectl get statefulsets.apps -A
NAMESPACE NAME READY AGE
monitoring alertmanager-prom-kube-prometheus-stack-alertmanager 1/1 42d
monitoring prometheus-prom-kube-prometheus-stack-prometheus 1/1 42d
steam-headless steam-headless 0/0 188d
tailscale ts-home-assistant-tailscale-mdqlt 1/1 280d
tailscale ts-kubernetes-dashboard-ingress-tailscale-jhb6z 1/1 281d
Longhorn
Longhorn needs to know the new node is ready for data; to activate Longhorn storage on the new node it need to be labeled:
Once the node is labeled, the Longhorn UI will immediately show that the node now has one
disk (/home/longhorn). The new disk will not automatically get any Disk Tag so the
nvme must be manually added so that the disk fits the longhorn-nmve storage class.
Once the node is ready and the 2TB SSD disk is ready and tagged, existing volumes can be
rescaled up by increasint the value of numberOfReplicas in the relevant volumes or
even on the longhorn-nmve storage class:
$ kubectl patch storageclass longhorn-nvme --type merge \
-p '{"parameters":{"numberOfReplicas":"2"}}'
This can also be done by updating y reapplying the longhorn-storage.yaml manifest
created when
migratng hostPaht volumes to Longhorn:
longhorn-storage.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-nvme
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
numberOfReplicas: "2"
diskSelector: "nvme"
dataLocality: "best-effort"
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-sata
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
numberOfReplicas: "1"
diskSelector: "sata"
dataLocality: "best-effort"
However, one does not simply modify a StorageClass:
$ kubectl apply -f longhorn-storage.yaml
The StorageClass "longhorn-nvme" is invalid: parameters: Forbidden: updates to parameters are forbidden.
Instead, the StorageClass must be deleted, then recreated anew; this does not affect
existing volumes, only those that are created later:
$ kubectl delete storageclass longhorn-nvme
storageclass.storage.k8s.io "longhorn-nvme" deleted
$ kubectl apply -f longhorn-storage.yaml
storageclass.storage.k8s.io/longhorn-nvme created
storageclass.storage.k8s.io/longhorn-sata unchanged
Since this does not affect existing volumes, Longhorn will not immediately start
replicating the volumes from octavo to lexicon; this needs to be done manually for
each existing volume by patching it; with the caveat that volumes on the SATA SSD are
not to be replicated:
$ kubectl get volumes.longhorn.io -n longhorn-system -o json \
| jq -r '.items[] | select(.spec.numberOfReplicas==1 and (.spec.diskSelector | contains(["nvme"]))) | .metadata.name' \
| xargs -I {} kubectl -n longhorn-system patch volumes.longhorn.io {} \
--type merge -p '{"spec":{"numberOfReplicas":2}}'
volume.longhorn.io/pvc-0c892178-0451-4043-be76-9e2e33464631 patched
volume.longhorn.io/pvc-1083fedd-27e9-4a58-8a8f-6b8553d62034 patched
volume.longhorn.io/pvc-17d73e68-4c5a-4f34-b5dc-89936202d8d7 patched
volume.longhorn.io/pvc-1d7dc891-4694-4744-be75-6ab12c11aea9 patched
volume.longhorn.io/pvc-2610bf0b-0c90-4ecb-956c-355d8619dbe4 patched
volume.longhorn.io/pvc-2b72de1d-93d8-492e-aed1-1708a35ce5b4 patched
volume.longhorn.io/pvc-2e7cd0bd-6efe-4e8b-ae81-a025b447a7f9 patched
volume.longhorn.io/pvc-43916c81-6e07-4eae-92e4-e37b816c407c patched
volume.longhorn.io/pvc-495872f3-cf95-4268-b3ea-2f4d51d33399 patched
volume.longhorn.io/pvc-5c724ff5-e7dd-483b-9450-0b8e299c49ca patched
volume.longhorn.io/pvc-5dd1e736-b80b-43a8-9570-95a3637cff4d patched
volume.longhorn.io/pvc-6250f010-4a0b-4ac5-83a6-07cf71d95b33 patched
volume.longhorn.io/pvc-726c1b2d-c0c0-4232-85f5-b3119558d0d1 patched
volume.longhorn.io/pvc-72741e77-c05c-4b84-b74d-bddcf32a2236 patched
volume.longhorn.io/pvc-7ae11ade-d6c8-4296-ac67-58953e3dddc2 patched
volume.longhorn.io/pvc-aea4a7e9-baf0-4fbc-ba5f-7007a66fcef6 patched
volume.longhorn.io/pvc-e3f04fc3-9126-43d6-82c0-c427a730b338 patched
volume.longhorn.io/pvc-fd8bb4e8-39a4-4b7f-a3c5-668333e4d64a patched
Doing this will result in all those volumes going from Healthy to Degraded.
Because lexicon was previously cordoned to avoid pods running on it before the
volumes were ready, this also means the Kubernetes scheduler (and by extension the
Longhorn scheduler) is forbidden from starting the Replica Instance Manager pods;
so now is the time to uncordon lexicon.
This launches Longhorn into an I/O frenzy to replicate all those volumes and keeps the 2.5 Gbps NICs at their maximum throughput for 8 minutes to replicate about 250 GB first, then stays at nearly the maximum for over an hour to finish replicating the last volume, much larger than the rest, until eventually all volumes are Healthy again.
Intel GPU
Support for the Intel GPU works automatically when joining the node:
$ kubectl describe node lexicon
Name: lexicon
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
...
gpu.intel.com/device-id.0300-9a78.count=1
gpu.intel.com/device-id.0300-9a78.present=true
intel.feature.node.kubernetes.io/gpu=true
...
Capacity:
cpu: 4
ephemeral-storage: 61376Mi
gpu.intel.com/i915: 1
gpu.intel.com/i915_monitoring: 1
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32484920Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 57921660423
gpu.intel.com/i915: 1
gpu.intel.com/i915_monitoring: 1
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32382520Ki
pods: 110
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 655m (16%) 100m (2%)
memory 219Mi (0%) 1102Mi (3%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
gpu.intel.com/i915 0 0
gpu.intel.com/i915_monitoring 0 0
Logs reader helper
Troubleshooting pods and services often involves reading or watching the logs, which
involves combining two kubectl commands to find the relevant pod/service and
requesting the logs. To make this easier, save the following script as ~/bin/klogs
(and add ~/bin/ to the $PATH):
#!/bin/bash
#
# Watch logs from Kubernetes pod/service.
#
# Usage: klogs <namespace> <pod/service>
ns=$1
pd=$2
if [[ "$pd" == "" ]]; then pd="$ns"; fi
kubectl logs -n $ns \
$(kubectl get pods -n $ns | grep $pd | cut -f1 -d' ') -f
Headlamp
Kubernetes Dashboard
was deprecated and archived in January 2026, and is no longer
maintained due to lack of active maintainers and contributors.
Headlamp
is the suggested replacement and was already
deployed in octavo
using its Helm chart.
Metrics Server
Skooner
was briefly installed and it relied heavily on the
Kubernetes Metrics Server
to display real-time cluster metrics. Although Skooner was later removed (it has been
abandoned for 5 years), metrics-server turned out necessary also for many of the
Grafana dashboards installed previously to
monitor the cluster for vulnerabilities.
When installed using
its Helm chart,
it uses by default a self-signed certificate which is generated during startup and the
APIservice resource is registered with .spec.insecureSkipTLSVerify set to true.
Although ideally metrics-server can be kept more secure by using the cert-manager
that is available in the cluster, in a homelab environment this is typically avoided.
metrics-server-values.yaml
Skipping TLS verification is the standard setup for homelabs, not production!
In a homelab environment without its own DNS server, nodes hostnames are not
resolvable via DNS, making the InternalIP the only way Metrics Server can reach
the nodes. However, kubeadm generates self-signed certificates for Kubelets that
do not include IP SANs by default, which causes the TLS verification failure
when trying to use certificates; metric-server pods would fail with:
$ kubectl -n kube-system logs metrics-server-fd5dc6448-f2fkm
...
E0204 22:02:20.935565 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.6:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.6 because it doesn't contain any IP SANs" node="lexicon"
E0204 22:02:20.943097 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.0.8:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.0.8 because it doesn't contain any IP SANs" node="octavo"
The most common solution in environments without internal DNS is to tell Metrics
Server to skip TLS verification when connecting to the Kubelet; this is what
the defaultArgs flags above do: disable CA verification for the node certificate
and force Metrics Server to use the IP directly instead of DNS.
To enable secure between the Metrics Server and the kubelets, these need to have Server Certificate Bootstrapping enabled. This allows Kubelets to request certificates signed by the cluster CA that include the correct IP SANs:
-
Enable Server TLS Bootstrapping: update
/var/lib/kubelet/config.yamlon each node to includeserverTLSBootstrap: trueand then restart the service withsystemctl restart kubelet. -
Approve the CSRs: Kubelet will generate a Certificate Signing Request on each node and these must be approved manually:
-
Update Metrics Server: Once nodes have valid certificates, point Metrics Server to the cluster's CA:
Install the Helm chart and then install metrics-server with the above values:
$ helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...
...Successfully got an update from the "metrics-server" chart repository
...
Update Complete. ⎈Happy Helming!⎈
$ helm upgrade --install \
metrics-server metrics-server/metrics-server \
--namespace=kube-system \
--values=metrics-server-values.yaml
Release "metrics-server" does not exist. Installing it now.
NAME: metrics-server
LAST DEPLOYED: Wed Feb 4 23:00:32 2026
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
* Metrics Server *
***********************************************************************
Chart version: 3.13.0
App version: 0.8.0
Image tag: registry.k8s.io/metrics-server/metrics-server:v0.8.0
***********************************************************************
Node Affinity
To ensure specific workloads remain on the primary node unless it goes down, use Node Affinity with a preferred rule based on node labels. First, label the nodes according to the relevant properties for the node affinity preferences; in this case start with a "performance level":
$ kubectl label node lexicon node-type=low-perf
node/lexicon labeled
$ kubectl label node octavo node-type=high-perf
node/octavo labeled
To apply a preferred affinity, use preferredDuringSchedulingIgnoredDuringExecution in
each Deployment manifest to set the preference for the node with the required label
value (e.g. node-type=high-perf). The scheduler will then always place the pod on the
high-perf node if it is available. If the high-perf node goes down, the scheduler will
allow the pod to be scheduled on other nodes because the rule is "preferred" rather than
"required".
Adding the affinity block to a Deployment's spec.template.spec can be done as a
Hard Rule (Required) or as a Soft Rule (Preferred).
Jellyfin must run on octavo, not only to use the Core i7 CPU but also to use the
4TB SATA SSD. If octavo is down, the pod will stay in a Pending state and will not
start on lexicon or other nodes without their a local replica of the relevant volumes:
jellyfin.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jellyfin
namespace: media-center
spec:
replicas: 1
selector:
matchLabels:
app: jellyfin
template:
metadata:
labels:
app: jellyfin
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- high-perf
containers:
- image: jellyfin/jellyfin
...
Audiobookshelf should run preferably on octavo for better performance, but in
octavo goes down then it should run on lexicon since it has its own replicas of all
the relevant volumes:
audiobookshelf.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: audiobookshelf
name: audiobookshelf
namespace: audiobookshelf
spec:
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
app: audiobookshelf
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app: audiobookshelf
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- high-perf
containers:
- image: ghcr.io/advplyr/audiobookshelf:latest
...
Zero-Downtime Reboots
Even though the cluster now has two nodes, most deployments are running only one pod (replica) that may be in either node (unless biased by Node Affinity).
Rebooting the control plane node (octavo) will still cause a total outage on lexicon
because three fundamental architectural dependencies are not yet correctly setup for
High Availability:
- Flannel: when
octavogoes down, the Flannel pods onlexiconcan lose the ability to communicate with the API server. Iflexicon's networking logic "hangs" because it cannot reach the control plane to verify routing tables, all cross-pod and ingress traffic stops. - MetalLB Controller: MetalLB has two parts: the Speaker (
DaemonSet, runs on both nodes) and the Controller (Deployment, runs on only 1 replica by default). If themetallb-controlleris running onoctavowhen it reboots, there is no "brain" to assign or refresh IPs. Even though the speaker onlexiconis alive, it may stop announcing the IP if it loses its lease or a network "hiccup" occurs during the control plane's absence. - Pomerium (
Ingress): all apps are exposed through Pomerium Ingress. If the Pomerium pods were not explicitly scaled to 2 replicas and spread across both nodes, they will likely remain onoctavo. Even if some services' pods are running fine on lexicon, the "Front Door" (Pomerium) is shut down when it's (only) in the rebooting node. However, Pomerium cannot be scaled beyond a single replica when using file-based persistant storage.
To prevent downtine when a node is rebooted, these components must be made High Availability across both NUCs:
-
Scale MetalLB: the
metallb-controllercontroller must be scaled manually because the Helm chart does not supportspec.replicaCount(or similar) at all:$ kubectl scale deployment metallb-controller -n metallb-system --replicas=2 deployment.apps/metallb-controller scaledAfter a few seconds there should be one replica
READYon each node:$ kubectl get pods -n metallb-system -l app.kubernetes.io/component=controller -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES metallb-controller-764cb589cc-fgk56 1/1 Running 0 5m18s 10.244.1.237 lexicon <none> <none> metallb-controller-764cb589cc-hzgbq 1/1 Running 0 29s 10.244.0.31 octavo <none> <none> -
Actually Don't Scale Pomerium: even after migrating Pomerium to a Longhorn
ReadWriteManyvolume, so that multiple pods can read/write the volume, the database inside the volume can only be used (locked) by a single pod. This is why, when scaling Pomerium to 2 replicas, 1 will stayRunningbut not healthy:$ kubectl scale deployment pomerium --replicas=2 $ kubectl get pods -n pomerium -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pomerium-6c7f5448b9-bvw64 0/1 Running 6 (8m19s ago) 40m 10.244.1.13 lexicon <none> <none> pomerium-6c7f5448b9-np27p 1/1 Running 1 (13m ago) 34m 10.244.0.101 octavo <none> <none> -
Handle the Control Plane "Unreachable" Taints:
By default, when the control plane goes down, Kubernetes waits 300 seconds before deciding to failover. This should be reduced in Deployment manifests beforehand:
spec: template: spec: tolerations: - key: "management-only" operator: "Equal" value: "true" effect: "NoSchedule" - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 10 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 10 -
Longhorn "First Node" Settings:
Longhorn's UI and some managers often default to the first node. Before rebooting
octavo, check the Longhorn UI to ensure all volumes are Healthy on both nodes. If a volume is only on octavo, the pod onlexiconwill crash.$ kubectl get pods -A -o wide | grep -E 'longhorn-manager|longhorn-ui' longhorn-system longhorn-manager-7dh85 2/2 Running 4 (11m ago) 5d10h 10.244.1.26 lexicon <none> <none> longhorn-system longhorn-manager-pbl4t 2/2 Running 4 (34h ago) 8d 10.244.0.73 octavo <none> <none> longhorn-system longhorn-ui-7fc9b4667f-h22gm 1/1 Running 4 (34h ago) 8d 10.244.0.68 octavo <none> <none> longhorn-system longhorn-ui-7fc9b4667f-xqdvw 1/1 Running 4 (34h ago) 8d 10.244.0.58 octavo <none> <none> -
Longhorn "Pod Deletion Policy": In the Longhorn UI, under Settings > General set Pod Deletion Policy When Node is Down to
delete-both-statefulset-and-deployment-podso that when a node is down, Longhorn will force-delete the pods. This breaks the "Volume Lock" and allows the pods to start on the remaining healthy node (octavo) immediately.
Tolerations on Helm Charts
To apply the above management-only tolerations to those applications deployed using Helm
charts, add the tolerations section under the following specific keys in the YAML
values for each chart. The simplest charts will take the tolerations at the top level:
Headlamp,
metrics-server,
cert-manager-webhook-porkbun,
tailscale.
Trivy Operator and Prometheus for the
Trivy Operator Dashboard in Grafana only accept the tolerations under specifc keys:
under trivyOperator in prometheus/trivy-values.yaml and under
prometheus.prometheusSpec in prometheus/values.yaml.
Longhorn takes tolerations
under multiple keys, but adding it simply under global will make it effective for
every Deployment and DaemonSet (longhorn-driver-deployer, longhorn-manager,
longhorn-ui); and the
The NFS CSI driver
takes controller.tolerations and node.tolerations in nfs-csi-values.yaml.
High Availability Ingress
Incoming requests are routed to the LoadBalancer (virtual) IP of
Pomerium; when it goes down in
one node and starts on the other one, MetalLB moves the virtual IP to the node now
running Pomerium. Even though Pomerium cannot be scaled up beyond one replica, MetalLB
takes care of moving requests to the correct node where Pomerium is running.
Moreover, when traffic hits the Virtual IP on any node, the Kubernetes internal network
(kube-proxy) automatically load balances that request to any available Pomerium pod,
regardless of which node it sits on.
In Layer 2 mode, MetalLB operates as a failover rather than a load balancer. When adding
the second node, the MetalLB speaker pod will automatically deploy to it as part of its
DaemonSet. The two speakers will communicate; one node will be "elected" the leader
and will handle the ARP requests for the Pomerium IP. If the current leader goes
offline, the other speaker will detect the loss of its peer and automatically begin
announcing that same IP address to the router.
Adding a second node to a cluster already running MetalLB in Layer 2 mode provides a "semi-automatic" path to high availability. While MetalLB will technically handle the failover, the current setup requires two manual checks to ensure it actually works when a node goes down:
-
External Traffic Policy:
Clusteris the default mode that allows any node to be the leader, and that is requiried for high availability ingress, so that even if the Pomerium pod isn't on that specific node. The leader node will catch the traffic and forward it internally to wherever the Pomerium pod is running. This can be checked with the following command:kubectl get svc -A$ kubectl get svc -A -o \ custom-columns="NAME:.metadata.name,NAMESPACE:.spec.externalTrafficPolicy" \ | grep -E 'Cluster|Local' firefly-iii-mysql-svc Cluster firefly-iii-svc Cluster komga-svc Cluster grafana-svc Cluster influxdb-svc Cluster navidrome-svc Cluster pomerium-proxy Cluster postgres-svc Cluster ryot-svc Cluster mongo-svc Cluster unifi-tcp Cluster unifi-tcp-1 Cluster unifi-udp Cluster -
L2 Advertisement Scope: must be unrestricted so that the
L2Advertisementresource advertises to all nodes. If restricted via anodeSelector, it must be updated it to include all the relevant nodes.
As it happens, MetalLB was already setup using the default Cluster polic and the
L2Advertisement resources were not restricted, so it was essentially ready from the
beginning.