Chef's Note: This guide has been refactored for stability. It uses K3s for simplicity, Kube-VIP for HA, and explicit cleanups for storage.
Phase 1: Architecture & Planning
We use K3s embedded HA. Kube-VIP runs as a DaemonSet to provide the Floating IP 10.10.66.100. This IP allows any master to accept API traffic.
Masters (x3)
10.10.66.10 - 12
Workers (x3)
10.10.66.20 - 22
VIP
10.10.66.100
Phase 2: SSH Access
Ensure you can log into all nodes passwordless.
ssh-keygen -t ed25519 -C "admin@macbook"
ssh-copy-id -i ~/.ssh/id_ed25519.pub user@10.10.66.10
# Repeat for all 6 nodes...
Phase 3: Networking (Netplan)
Configure static IPs for all nodes.
Debian Users: sudo apt install netplan.io first.
sudo netplan apply
Use the Configurator on the left to get your YAML.
Phase 4: OS Prep (Run on ALL 6 Nodes)
Crucial: These steps prepare the OS for K3s and storage. Do not skip.
1. Set Hostnames
# Example for Node 1
sudo hostnamectl set-hostname k8s-master-01
2. Update Hosts File
cat <<EOF | sudo tee -a /etc/hosts
10.10.66.10 k8s-master-01
10.10.66.11 k8s-master-02
10.10.66.12 k8s-master-03
10.10.66.20 k8s-worker-01
10.10.66.21 k8s-worker-02
10.10.66.22 k8s-worker-03
10.10.66.100 k8s-cluster-endpoint
EOF
3. Install Dependencies (Includes CIFS)
This installs the SMB helper utilities required for storage.
sudo apt update && sudo apt install -y curl open-iscsi nfs-common cifs-utils
4. Load IPVS Modules (For Kube-VIP)
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF
sudo systemctl restart systemd-modules-load
5. Firewall (Proxmox Check)
Crucial: Ensure "Firewall" is unchecked in Proxmox Network settings for the VM NIC.
Phase 5: Init Master 01
Run on k8s-master-01 (10.10.66.10).
The Magic Flag:
We add --tls-san 10.10.66.100 here. This pre-authorizes the VIP in the certs.
curl -sfL https://get.k3s.io | K3S_TOKEN=mylittlesecret sh -s - server \
--cluster-init \
--tls-san 10.10.66.100 \
--node-ip 10.10.66.10
Wait for Node Ready
Before proceeding, ensure the first master is fully operational:
sudo k3s kubectl get nodes
# Wait until STATUS shows "Ready" (may take 30-60s)
Phase 6: Enable VIP (DaemonSet)
Now we deploy Kube-VIP so the IP 10.10.66.100 actually turns on.
1. RBAC & Cloud Provider
kubectl apply -f https://kube-vip.io/manifests/rbac.yaml
kubectl apply -f https://raw.githubusercontent.com/kube-vip/kube-vip-cloud-provider/main/manifest/kube-vip-cloud-controller.yaml
2. Deploy Kube-VIP DaemonSet (Updated)
Fix for "Pending" Services:
This command now includes the --services flag. You MUST re-run this if your LoadBalancer services are stuck in Pending.
# Detect Interface
export INTERFACE=$(ip route get 8.8.8.8 | awk '{print $5}')
export VIP=10.10.66.100
# Pull image
sudo k3s ctr image pull ghcr.io/kube-vip/kube-vip:v0.8.0
# Generate & Apply (Includes --services now)
sudo k3s ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:v0.8.0 vip /kube-vip manifest daemonset \
--interface $INTERFACE \
--address $VIP \
--inCluster \
--taint \
--controlplane \
--services \
--arp \
--leaderElection \
| kubectl apply -f -
3. Verify VIP
ping 10.10.66.100 -c 4
# You should now get a reply!
4. Configure Service IP Pool (Fixes "Pending")
Give Kube-VIP a range of IPs to assign to LoadBalancer services (like your webpage).
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: kubevip
namespace: kube-system
data:
# Assigns IPs from .200 to .210 for services
range-global: 10.10.66.200-10.10.66.210
EOF
Phase 7: Join Nodes
Now that the VIP is up, we use it to join the rest of the cluster.
1. Join Masters (02 & 03)
curl -sfL https://get.k3s.io | K3S_TOKEN=mylittlesecret sh -s - server \
--server https://10.10.66.100:6443 \
--tls-san 10.10.66.100
2. Join Workers (01, 02, 03)
curl -sfL https://get.k3s.io | K3S_TOKEN=mylittlesecret sh -s - agent \
--server https://10.10.66.100:6443
3. Verify All Nodes Joined
kubectl get nodes -o wide
# All 6 nodes should show STATUS: Ready
Phase 8: Storage (SMB CSI)
We use the CSI driver to mount your SMB share.
1. Install Utilities (Run on ALL 6 Nodes)
This is crucial! Without this, the pod will hang in ContainerCreating.
sudo apt update && sudo apt install -y cifs-utils
2. Install Driver (Run on Master 01)
curl -skSL https://raw.githubusercontent.com/kubernetes-csi/csi-driver-smb/master/deploy/install-driver.sh | bash -s master --
3. Create Secret, PV, and PVC
Run this entire block on Master 01.
Action: Replace YOUR_SMB_PASSWORD below. The path is pre-configured for your [k8s] share.
Common Issue: Permission Denied (13)
Symptom: Pod runs but can't write to SMB share.
Cause: Mismatch between K8s user and SMB permissions.
Fix: Verify your SMB share allows the user specified in the Secret. On your SMB server, check /etc/samba/smb.conf for the [k8s] section and confirm valid users = k8s.
# Clean up old resources first
kubectl delete secret smb-creds --ignore-not-found
kubectl delete pv smb-pv-static --ignore-not-found
kubectl delete pvc smb-pvc-static -n default --ignore-not-found
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: smb-creds
namespace: default
stringData:
# Updated based on your smb.conf
username: "k8s"
password: "YOUR_SMB_PASSWORD"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: smb-pv-static
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
mountOptions:
- dir_mode=0777
- file_mode=0777
- vers=3.0
csi:
driver: smb.csi.k8s.io
readOnly: false
# This handle MUST be unique in the cluster
volumeHandle: smb-vol-static-01
volumeAttributes:
source: "//10.10.66.9/k8s"
nodeStageSecretRef:
name: smb-creds
namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: smb-pvc-static
namespace: default
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeName: smb-pv-static
storageClassName: ""
---
apiVersion: v1
kind: Pod
metadata:
name: smb-pod-static
namespace: default
spec:
containers:
- name: busybox
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
- mountPath: "/mnt/smb"
name: smb-volume
volumes:
- name: smb-volume
persistentVolumeClaim:
claimName: smb-pvc-static
EOF
4. Test Storage (Deploy/Retry)
This block deletes any stuck pods and deploys a fresh one.
# 1. Clear stuck pod
kubectl delete pod smb-test-pod --ignore-not-found
# 2. Deploy fresh pod
# (The block above already includes the pod deployment)
5. Verify Mount Works
# Wait for pod to be Running
kubectl wait --for=condition=ready pod/smb-pod-static --timeout=120s
# Test write access
kubectl exec smb-pod-static -- sh -c "echo 'test' > /mnt/smb/test.txt && cat /mnt/smb/test.txt"
# Should output: test
Troubleshooting "ContainerCreating":
If the pod stays in ContainerCreating, you are missing cifs-utils on the node.
Run Step 1 (Install Utilities) on ALL nodes, then run Step 4 again.
Phase 9: Verification & Cleanup
Housekeeping for your new cluster.
1. Fix Stuck Storage Pod
If smb-test-pod is stuck in ContainerCreating (usually due to missing cifs-utils), delete it to force a retry.
kubectl delete pod smb-test-pod
2. Clean Completed Jobs
K3s leaves helm-install pods in Completed state for logs. You can safely remove them.
kubectl delete pod --field-selector=status.phase==Succeeded -A
3. Verify Healthy Components (Do NOT Delete)
Ensure the following system pods are Running. These are critical:
coredns-* (DNS)
traefik-* (Ingress Controller)
metrics-server-* (Cluster Metrics)
kube-vip-* (HA Networking)
csi-smb-controller-* & csi-smb-node-* (Storage Drivers)
kubectl get pods -n kube-system
4. Final Node Check
kubectl get nodes -o wide
5. Factory Reset (Uninstall K3s)
If you need to wipe a node and start over:
# On Masters
/usr/local/bin/k3s-uninstall.sh
# On Workers
/usr/local/bin/k3s-agent-uninstall.sh
Phase 10: Status Cheat Sheet
Status: ContainerCreating
Cause: Missing cifs-utils on worker nodes.
Fix: apt install cifs-utils on all nodes -> Delete Pod -> Retry.
Status: CrashLoopBackOff
Cause: Config error (DB connection, missing env var).
Fix: Check logs: kubectl logs [pod_name] --previous.
Phase 11: Application Deployment (Blazing Junkies)
Deploying the converted Docker Compose stack.
Prerequisites:
Ensure Phase 8 (SMB Storage) is complete and smb-pvc-static is bound before deploying apps that need persistent storage.
1. Create Configuration & Secrets (Auto-Fix)
This block deletes any existing (broken) secrets, sets a default password, and restarts the DB to fix "CrashLoopBackOff".
# 1. Clean up old/broken secrets
kubectl delete secret blazing-secrets --ignore-not-found
# 2. Apply Configuration
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: blazing-config
namespace: default
data:
# Database Configuration
POSTGRES_DB: "blazing"
DB_USER: "postgres"
DB_URL: "jdbc:postgresql://db:5432/blazing"
# App Configuration
EVE_AUTH_CLIENT_ID: "a1b458f37a2f4798bd88bc0d710b6e36"
EVE_AUTH_REDIRECT: "https://burnermissions.com/auth/code"
PAGE_TITLE: "Blazing Junkies"
PAGE_DESCRIPTION: "Gathering all the LP"
PAGE_KEYWORDS: "Burners"
DISCORD_SERVER_ID: "931370698495115345"
DISCORD_SERVER_URL: "https://discord.gg/Dv4qWGyRkm"
DISCORD_BOT_ID: "1317444548762402866"
ACL_ALLIANCES: "99001969,99003214,99009163,99012042,150097440,1354830081,131511956,99003995,99009331,99010140,99011162,99011223,99010931,1900696668"
---
apiVersion: v1
kind: Secret
metadata:
name: blazing-secrets
namespace: default
stringData:
# Pre-filled to prevent startup crashes. Change if needed.
POSTGRES_PASSWORD: "changeme"
DB_PASSWORD: "changeme"
EVE_AUTH_CLIENT_SECRET: "fill-me"
PAGE_SECRET: "fill-me"
DISCORD_BOT_TOKEN: "fill-me"
EOF
# 3. Force DB to pick up new password
kubectl delete pod -l app=db
2. Create Database Storage
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: db-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
# Uses default local-path storage class (fast for DBs)
EOF
3. Deploy Database
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: db
namespace: default
spec:
selector:
matchLabels:
app: db
template:
metadata:
labels:
app: db
spec:
containers:
- name: db
image: postgres:17
ports:
- containerPort: 5432
envFrom:
- configMapRef:
name: blazing-config
- secretRef:
name: blazing-secrets
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
volumeMounts:
- name: db-data
mountPath: /var/lib/postgresql/data
volumes:
- name: db-data
persistentVolumeClaim:
claimName: db-pvc
---
apiVersion: v1
kind: Service
metadata:
name: db
namespace: default
spec:
selector:
app: db
ports:
- port: 5432
targetPort: 5432
EOF
4. Deploy Web App
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: webpage
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: webpage
template:
metadata:
labels:
app: webpage
spec:
containers:
- name: webpage
image: mrakaki/dev-containers:blazing-0.1.14
ports:
- containerPort: 8080
envFrom:
- configMapRef:
name: blazing-config
- secretRef:
name: blazing-secrets
---
apiVersion: v1
kind: Service
metadata:
name: webpage
namespace: default
spec:
type: LoadBalancer
selector:
app: webpage
ports:
- port: 80
targetPort: 8080
EOF
Accessing the App:
Once you run Phase 6 Step 4 (IP Pool), check the IP:
kubectl get svc webpage
It should show an IP like 10.10.66.200. Open that in your browser!
Troubleshooting DB:
1. Check Logs: kubectl logs -l app=db
2. Restart DB: kubectl delete pod -l app=db (Forces it to pick up new secrets)
3. Restart Web: kubectl rollout restart deployment webpage (Once DB is running)