🔍 Kubernetes Troubleshooting: Debugging Workloads
The Detective Story of Your Cluster
Imagine you’re a detective in a busy city. The city is your Kubernetes cluster. The buildings are your pods. The roads are your services. Sometimes things go wrong—a building loses power, a road gets blocked. Your job? Find the problem and fix it!
🏢 What is “Debugging Workloads”?
Think of your Kubernetes cluster like a giant LEGO city:
- Pods = Individual LEGO buildings
- Services = Roads connecting buildings
- Events = A diary that writes down everything that happens
When something breaks, you need tools to:
- Look inside buildings (pods)
- Check if roads work (services)
- Read the diary (events)
- Send in a helper robot (debug containers)
đź§± 1. Debugging Pods
What’s a Pod Problem?
A pod is like a tiny apartment where your app lives. Sometimes:
- The apartment won’t start (CrashLoopBackOff)
- The apartment is stuck waiting (Pending)
- Something inside is broken (Error)
Step 1: Check Pod Status
kubectl get pods
This shows all your apartments and their health:
NAME READY STATUS
my-app-abc 1/1 Running âś…
broken-app 0/1 Error ❌
stuck-app 0/1 Pending ⏳
Step 2: Get More Details
kubectl describe pod broken-app
This is like reading the apartment’s full report card—why it failed, what went wrong.
Step 3: Read the Logs
kubectl logs broken-app
Logs are like reading what the app said before it crashed. “Help! I can’t find the database!”
Step 4: Previous Container Logs
If the pod crashed and restarted:
kubectl logs broken-app --previous
This reads the last words before it died.
🎯 Common Pod Problems
| Status | What It Means | Fix |
|---|---|---|
Pending |
Waiting for resources | Check node capacity |
CrashLoopBackOff |
Keeps crashing | Check logs! |
ImagePullBackOff |
Can’t download image | Check image name/registry |
OOMKilled |
Ran out of memory | Increase memory limits |
🛣️ 2. Debugging Services
What’s a Service Problem?
Services are like phone lines connecting your apps. If the phone line is broken, apps can’t talk to each other!
Step 1: Check Service Exists
kubectl get services
NAME TYPE CLUSTER-IP PORT(S)
my-service ClusterIP 10.0.0.5 80/TCP
Step 2: Check Endpoints
This is the magic trick! Endpoints show which pods the service connects to:
kubectl get endpoints my-service
NAME ENDPOINTS
my-service 10.1.2.3:80,10.1.2.4:80 âś… Good!
my-service <none> ❌ No pods!
If endpoints are empty, the service can’t find any pods!
Step 3: Check Labels Match
Services find pods using labels (like name tags).
kubectl describe service my-service
Look for Selector: app=my-app
Then check your pod:
kubectl get pods --show-labels
Do the labels match? If not, that’s your problem!
Step 4: Test Connection
From inside a pod, try calling the service:
kubectl exec -it test-pod -- \
curl my-service:80
đź“– 3. Events and Troubleshooting
What Are Events?
Events are like a diary that Kubernetes writes automatically. Every time something happens—good or bad—it gets written down!
View All Events
kubectl get events --sort-by='.lastTimestamp'
This shows recent events, sorted by time:
REASON MESSAGE
FailedScheduling No nodes available
Pulled Successfully pulled image
Started Started container
BackOff Back-off restarting failed
View Events for One Pod
kubectl describe pod my-pod
Scroll to the bottom—you’ll see the Events section!
🎯 Important Event Types
| Reason | What It Means |
|---|---|
FailedScheduling |
Can’t find a home for pod |
FailedMount |
Volume won’t attach |
BackOff |
Container keeps failing |
Unhealthy |
Health check failed |
Killing |
Pod being terminated |
Pro Tip: Watch Events Live
kubectl get events --watch
Like watching a live news feed of your cluster!
🤖 4. Ephemeral Debug Containers
The Problem
Sometimes your app container is super minimal—no shell, no tools, nothing! You can’t even run ls inside it.
The Solution: Debug Containers
Ephemeral debug containers are like sending a helper robot into the apartment. The robot has all the tools you need!
graph TD A["Your Pod"] --> B["App Container<br/>No tools ❌"] A --> C["Debug Container<br/>All tools ✅"] C --> D["Investigate!"]
How It Works
- Your pod is running
- You inject a debug container
- Debug container joins the pod
- It can see everything the app sees!
- When you’re done, it disappears
Key Feature
The debug container shares:
- âś… Network (same IP, same ports)
- âś… Process namespace (can see app processes)
- âś… Volumes (can see files)
But it’s temporary—won’t restart with the pod!
🛠️ 5. kubectl debug Command
Your Swiss Army Knife
The kubectl debug command is your ultimate debugging tool. It can:
- Add debug containers to running pods
- Copy a pod for safe debugging
- Debug nodes directly
Method 1: Debug a Running Pod
kubectl debug my-pod -it \
--image=busybox \
--target=my-container
What this does:
-it= Interactive terminal--image=busybox= Use busybox (has tools!)--target=my-container= Share process namespace
Now you can run commands inside!
Method 2: Copy and Debug
Don’t want to touch the original pod? Make a copy!
kubectl debug my-pod -it \
--image=busybox \
--copy-to=my-pod-debug
This creates my-pod-debug—a safe playground!
Method 3: Debug with Different Image
App image is broken? Run it with a working one:
kubectl debug my-pod -it \
--copy-to=my-pod-debug \
--container=my-container \
--image=ubuntu
Method 4: Debug a Node
Need to check the actual server?
kubectl debug node/my-node -it \
--image=ubuntu
You’re now inside the node with a special pod!
🎯 Common Debug Images
| Image | When to Use |
|---|---|
busybox |
Basic commands (ls, cat, wget) |
alpine |
Small but has package manager |
ubuntu |
Full Linux environment |
nicolaka/netshoot |
Network debugging |
🎠The Complete Detective Workflow
graph TD A[Something's Wrong!] --> B{What type?} B -->|Pod Issue| C["kubectl get pods"] B -->|Service Issue| D["kubectl get endpoints"] B -->|Unknown| E["kubectl get events"] C --> F["kubectl describe pod"] C --> G["kubectl logs"] D --> H["Check selectors match"] D --> I["Test with curl"] E --> J["Find error message"] F --> K{Need more?} G --> K H --> K I --> K J --> K K -->|Yes| L["kubectl debug"] K -->|No| M["Fix the issue!"]
🚀 Quick Reference Commands
Pod Debugging
# See all pods
kubectl get pods
# Detailed info
kubectl describe pod <name>
# View logs
kubectl logs <name>
# Previous logs
kubectl logs <name> --previous
# Follow logs live
kubectl logs <name> -f
Service Debugging
# Check endpoints
kubectl get endpoints <service>
# Describe service
kubectl describe service <name>
# Test from another pod
kubectl exec -it <pod> -- curl <service>:<port>
Events
# All events
kubectl get events
# Sorted by time
kubectl get events --sort-by='.lastTimestamp'
# Watch live
kubectl get events --watch
Debug Containers
# Debug running pod
kubectl debug <pod> -it --image=busybox
# Create debug copy
kubectl debug <pod> -it --copy-to=<new-name>
# Debug a node
kubectl debug node/<node> -it --image=ubuntu
🎓 Remember This!
“When in doubt, check these three things:”
- đź“‹ Pod Status -
kubectl describe pod- 📝 Logs -
kubectl logs- đź“° Events -
kubectl get events
If those don’t help, bring in the debug container—your ultimate helper robot! 🤖
🌟 You Did It!
You now know how to:
- âś… Debug pods like a detective
- âś… Fix service connection issues
- âś… Read cluster events
- âś… Use ephemeral debug containers
- âś… Master the
kubectl debugcommand
Go forth and debug with confidence! 🎉
