Cluster Operations

Back

Loading concept...

Kubernetes Cluster Operations: Your Mission Control Center 🚀

Imagine you’re the captain of a massive spaceship fleet. Each spaceship is a node, and you need to keep them flying together in perfect harmony. That’s exactly what cluster operations is all about!


The Big Picture: What is Cluster Operations?

Think of a Kubernetes cluster like a team of robots working together in a factory. Sometimes you need to:

  • Add new robots (nodes) to the team
  • Send robots for repairs (maintenance)
  • Upgrade the robots’ brains (software updates)
  • Make backup copies of the factory’s memory (etcd backup)

kubeadm is your magic remote control that makes all this happen!


🔧 Kubeadm Installation

What is kubeadm?

kubeadm is like the instruction manual + toolbox that helps you build a Kubernetes cluster from scratch. It’s the official way to set things up!

Installing kubeadm (The Recipe)

Before you can use kubeadm, you need three tools:

  • kubelet - The worker that runs on every node
  • kubeadm - The builder tool
  • kubectl - Your command line remote control
# Step 1: Add Kubernetes repo
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Step 2: Add to sources
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Step 3: Install the trio
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl

Pro tip: Always disable swap on your nodes! Kubernetes doesn’t like sharing memory with swap.

sudo swapoff -a

🎬 Kubeadm Init and Join

The Two Magic Words

kubeadm init = “Start a new cluster! I’m the boss node (control plane)!”

kubeadm join = “Hey boss, can I join your team?”

Starting Your First Cluster (init)

sudo kubeadm init --pod-network-cidr=10.244.0.0/16

When this finishes, you’ll see a special token. Save it! It’s like a secret password for other nodes to join.

graph TD A["Run kubeadm init"] --> B["Control Plane Created"] B --> C["Get Join Token"] C --> D["Share Token with Workers"] D --> E["Workers Run kubeadm join"] E --> F["Cluster Ready!"]

Joining Worker Nodes

On each worker node, run the join command you got:

sudo kubeadm join 192.168.1.100:6443 \
  --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:abc123...

Lost your token? No worries! Create a new one:

kubeadm token create --print-join-command

⚙️ Kubeadm Configuration

Custom Settings for Your Cluster

Instead of typing long commands, you can write a config file - like a recipe card for your cluster!

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.30.0
networking:
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
controlPlaneEndpoint: "cluster.example.com:6443"

Use it like this:

sudo kubeadm init --config=cluster-config.yaml

Common Configuration Options

Setting What it does
kubernetesVersion Which Kubernetes version to use
podSubnet IP range for pods
serviceSubnet IP range for services
controlPlaneEndpoint Address of your control plane

⬆️ Cluster Upgrades

Why Upgrade?

Just like updating apps on your phone, Kubernetes gets better over time. New features, security fixes, and performance improvements!

The Golden Rule

Always upgrade one minor version at a time!

  • ✅ 1.29 → 1.30 (Good!)
  • ❌ 1.28 → 1.30 (Too big a jump!)

Upgrade Steps

graph TD A["Upgrade kubeadm"] --> B["Upgrade Control Plane"] B --> C["Drain Workers One by One"] C --> D["Upgrade kubelet on Workers"] D --> E["Uncordon Workers"] E --> F["Verify Everything Works"]

Step 1: Upgrade kubeadm on control plane

sudo apt-get update
sudo apt-get install -y kubeadm=1.30.0-00

Step 2: Plan and apply the upgrade

sudo kubeadm upgrade plan
sudo kubeadm upgrade apply v1.30.0

Step 3: Upgrade kubelet and kubectl

sudo apt-get install -y kubelet=1.30.0-00 kubectl=1.30.0-00
sudo systemctl daemon-reload
sudo systemctl restart kubelet

🔧 Node Maintenance

Taking Care of Your Nodes

Sometimes nodes need a break - hardware fixes, OS updates, or troubleshooting. Here’s how to do it safely!

The Three States of a Node

graph TD A["Normal Node"] -->|kubectl cordon| B["Cordoned - No New Pods"] B -->|kubectl drain| C["Drained - Empty Node"] C -->|Do Maintenance| D["Maintenance Done"] D -->|kubectl uncordon| A

🚧 Draining and Cordoning Nodes

Cordoning: “No New Guests Please!”

Think of it like putting up a “No Vacancy” sign at a hotel. Current guests stay, but no new ones come in.

kubectl cordon node-worker-1

Check it:

kubectl get nodes
# You'll see SchedulingDisabled

Draining: “Everybody Out!”

This is like evacuating the hotel. All pods move to other nodes.

kubectl drain node-worker-1 \
  --ignore-daemonsets \
  --delete-emptydir-data

Why --ignore-daemonsets? DaemonSets are special pods that run on every node. They’ll come back after maintenance anyway!

Uncordoning: “Welcome Back!”

After maintenance, open the doors again:

kubectl uncordon node-worker-1

Real Example: Kernel Update

# 1. Cordon the node
kubectl cordon node-worker-1

# 2. Drain all pods
kubectl drain node-worker-1 \
  --ignore-daemonsets \
  --delete-emptydir-data

# 3. SSH to node, do the update
ssh node-worker-1
sudo apt-get update && sudo apt-get upgrade -y
sudo reboot

# 4. After reboot, uncordon
kubectl uncordon node-worker-1

💾 etcd Backup and Restore

What is etcd?

etcd is the brain of your cluster - it remembers everything! All your deployments, secrets, configs - everything lives here.

If etcd dies and you have no backup, you lose your entire cluster!

Creating a Backup

ETCDCTL_API=3 etcdctl snapshot save \
  /backup/etcd-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Verify your backup:

ETCDCTL_API=3 etcdctl snapshot status \
  /backup/etcd-snapshot.db --write-out=table

Restoring from Backup

When disaster strikes, here’s how to bring it back:

# 1. Stop etcd
sudo systemctl stop etcd

# 2. Restore the snapshot
ETCDCTL_API=3 etcdctl snapshot restore \
  /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restored

# 3. Update etcd to use new directory
# (Edit /etc/kubernetes/manifests/etcd.yaml)

# 4. Restart etcd
sudo systemctl start etcd

Backup Schedule Tip

Set up a cron job for automatic backups:

# Every 6 hours, backup etcd
0 */6 * * * /usr/local/bin/backup-etcd.sh

🌐 etcd Cluster Management

Why Multiple etcd Nodes?

One etcd is risky - if it fails, game over! With 3 or 5 etcd members, your cluster survives failures.

The magic number: Always use odd numbers (3, 5, 7)

Why? They vote on decisions. Odd numbers prevent ties!

Checking Cluster Health

ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://etcd1:2379,https://etcd2:2379,https://etcd3:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Adding a New etcd Member

# 1. Add member
ETCDCTL_API=3 etcdctl member add etcd-new \
  --peer-urls=https://192.168.1.104:2380

# 2. Start etcd on new node with --initial-cluster-state=existing

Removing a Failed Member

# 1. List members to get ID
ETCDCTL_API=3 etcdctl member list

# 2. Remove by ID
ETCDCTL_API=3 etcdctl member remove abc123def456
graph TD A["3-Node etcd Cluster"] --> B{Node Fails?} B -->|1 node fails| C["Cluster Still Works!"] B -->|2 nodes fail| D["Cluster STOPS - Lost Quorum"] C --> E["Replace Failed Node"] E --> A

🎯 Quick Reference Commands

Task Command
Initialize cluster kubeadm init
Join cluster kubeadm join <token>
Create join token kubeadm token create --print-join-command
Upgrade cluster kubeadm upgrade apply v1.30.0
Cordon node kubectl cordon <node>
Drain node kubectl drain <node> --ignore-daemonsets
Uncordon node kubectl uncordon <node>
Backup etcd etcdctl snapshot save <file>
Restore etcd etcdctl snapshot restore <file>
Check etcd health etcdctl endpoint health

🌟 You’ve Got This!

Running a Kubernetes cluster is like being a conductor of an orchestra. Each node is a musician, kubeadm is your baton, and etcd is your sheet music.

Remember:

  • 🔧 kubeadm builds and manages your cluster
  • 🚀 init starts it, join grows it
  • ⬆️ Upgrades go one step at a time
  • 🚧 Drain before maintenance, uncordon after
  • 💾 Backup etcd - your cluster’s memory!

Now you’re ready to orchestrate at scale! 🎵

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.