Cluster Health

Back

Loading concept...

🏥 Kubernetes Cluster Health: Be the Doctor for Your Computer City!

The Story: Your Computer City Needs a Health Checkup!

Imagine you run a magical city made of computers. This city has many buildings (called Nodes), and in each building, little workers (called Pods) do important jobs. There’s also a Mayor’s Office (the API Server) that tells everyone what to do.

But what happens when a building gets sick? Or when the Mayor’s Office phone stops working? The whole city could stop!

That’s why YOU need to be the City Doctor — checking that everything is healthy, finding problems early, and fixing them before anyone notices!


🏢 Node Health Monitoring: Are Your Buildings Healthy?

What is a Node?

A Node is like a building in your city. Each building has:

  • Electricity (CPU power)
  • Storage rooms (Memory)
  • Loading docks (Network)

Simple Example: Checking if a Building is OK

kubectl get nodes

This is like walking past each building and asking: “Are you open today?”

You’ll see something like:

NAME       STATUS   ROLES    AGE   VERSION
node-1     Ready    worker   5d    v1.28.0
node-2     Ready    worker   5d    v1.28.0
node-3     NotReady worker   5d    v1.28.0
  • Ready = Building is open and working!
  • NotReady = Building is closed — something is wrong!

Real Life Example

Think of it like checking if a shop is open:

  • You walk by and the lights are ON → Ready
  • The lights are OFF and door is locked → NotReady

🌡️ Node Status and Conditions: What’s Wrong with This Building?

When a building says “I’m not feeling well,” you need to know exactly what hurts. Kubernetes tells you with Conditions.

The 4 Important Health Checks

graph TD A["Node Health Check"] --> B["Ready?"] A --> C["Memory OK?"] A --> D["Disk OK?"] A --> E["Network OK?"] B -->|True| F["✅ Can do work"] C -->|False| G["🧠 MemoryPressure"] D -->|False| H["💾 DiskPressure"] E -->|False| I["🌐 NetworkUnavailable"]
Condition What It Means Like This in Real Life
Ready Node can accept work Shop is open for business
MemoryPressure Running out of memory Storage room is too full
DiskPressure Running out of disk space File cabinets are overflowing
NetworkUnavailable Can’t talk to network Phone lines are cut

How to Check a Node’s Health Details

kubectl describe node node-1

This shows you everything about that building — like reading its full medical report!

Look for the Conditions section:

Conditions:
  Type             Status
  ----             ------
  MemoryPressure   False
  DiskPressure     False
  NetworkUnavailable False
  Ready            True

Good news! All conditions are healthy:

  • MemoryPressure = False (memory is fine!)
  • DiskPressure = False (disk is fine!)
  • Ready = True (everything works!)

🏙️ Cluster Health Monitoring: Is the Whole City OK?

Now let’s zoom out! Instead of checking one building, let’s check the entire city.

Quick City Overview

kubectl get nodes -o wide

This shows ALL buildings with extra details like their addresses (IP) and what type they are.

Counting Healthy vs Sick Buildings

kubectl get nodes | grep -c Ready
kubectl get nodes | grep -c NotReady

Simple Example:

  • If you have 10 buildings and 2 say “NotReady”…
  • That’s like 2 shops closed on Main Street — you need to investigate!

Using Metrics Server: The City Thermometer 🌡️

Want to see how hard each building is working?

kubectl top nodes

Output:

NAME     CPU(cores)   MEMORY
node-1   500m         1200Mi
node-2   200m         800Mi

This is like checking:

  • How many lights are on in each building (CPU)
  • How full are the storage rooms (Memory)

📞 API Server Health Checks: Is the Mayor’s Office Working?

The API Server is like the Mayor’s Office. Every order, every request, every question goes through it. If it stops working, your whole city is stuck!

Simple Test: Can We Call the Mayor?

kubectl cluster-info

If it responds, the Mayor is answering! You’ll see:

Kubernetes control plane is running at
  https://10.0.0.1:6443

Health Endpoint: The Mayor’s Direct Line

The API Server has special “health phones” you can call:

# Is the API server alive?
kubectl get --raw='/livez'

# Is it ready to work?
kubectl get --raw='/readyz'

# Full health report
kubectl get --raw='/healthz'

What they mean:

Endpoint Question Good Answer
/livez “Are you alive?” ok
/readyz “Ready for work?” ok
/healthz “Overall health?” ok

Real Example: Checking Everything

kubectl get --raw='/readyz?verbose'

This gives you a detailed report of every part:

[+] ping ok
[+] etcd ok
[+] poststarthook/start ok
healthz check passed

Each [+] is a healthy system. If you see [-], something needs attention!


🔍 Quick Troubleshooting Flow

graph TD A["🏥 Problem Detected!"] --> B{Can I reach<br>API Server?} B -->|No| C["Check API Server&lt;br&gt;/livez /readyz"] B -->|Yes| D{Are Nodes Ready?} D -->|No| E["Run: kubectl&lt;br&gt;describe node"] D -->|Yes| F{Are Pods<br>Running?} E --> G["Check Conditions:&lt;br&gt;Memory/Disk/Network"] F -->|No| H["Check Pod logs&lt;br&gt;and events"] F -->|Yes| I["✅ Cluster is&lt;br&gt;Healthy!"]

🎯 The Doctor’s Checklist

Every good City Doctor checks these things:

  1. 🏢 Node Healthkubectl get nodes
  2. 🔍 Node Detailskubectl describe node <name>
  3. 📊 Resource Usagekubectl top nodes
  4. 📞 API Serverkubectl get --raw='/healthz'

💡 Remember This!

What to Check Command What You’re Looking For
All buildings kubectl get nodes All should be “Ready”
Building health kubectl describe node Conditions all healthy
City overview kubectl top nodes CPU/Memory not maxed
Mayor’s office kubectl get --raw='/healthz' Should say “ok”

🌟 You’re Now a Cluster Doctor!

You learned how to:

  • ✅ Check if individual Nodes (buildings) are healthy
  • ✅ Understand Node Conditions (memory, disk, network)
  • ✅ Monitor the whole Cluster (city) at once
  • ✅ Verify the API Server (Mayor’s office) is working

Next time your Kubernetes city feels sick, you know exactly where to look! 🏥🔍

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.