What is Horizontal Pod Autoscaler (HPA)?

HPA adds more pod copies when your app gets busy and removes extras when traffic drops. It watches CPU and memory to decide when to scale.

What is the difference between HPA and VPA?

HPA adds more pods (horizontal scaling) for stateless apps. VPA makes existing pods bigger (vertical scaling) for apps that can't run multiple copies.

When does Cluster Autoscaler add nodes?

Cluster Autoscaler adds nodes when pods are pending because there's not enough space on existing nodes. It removes nodes when utilization is low.

What blocks Kubernetes scale-down?

Pods with local storage, pods without controllers, system pods, and PodDisruptionBudget violations can block scale-down to protect your apps.

Kubernetes Autoscaling | HPA, VPA & CA Guide

Kubernetes Autoscaling: Teaching Your Cluster to Grow and Shrink Like Magic

The Story of the Smart Restaurant

Imagine you own a restaurant. Some days, only 5 customers come. Other days, 500 people show up!

If you always have 50 waiters working, you’re wasting money on slow days. But if you only have 5 waiters, your customers wait forever on busy days.

What if your restaurant could magically hire more waiters when it gets busy and send them home when it’s quiet?

That’s exactly what Kubernetes Autoscaling does for your applications!

Meet the Three Autoscaling Heroes

Think of Kubernetes autoscaling like a team of three smart helpers:

graph TD
    A["🎯 Your App Needs Help!"] --> B["HPA - Horizontal Pod Autoscaler"]
    A --> C["VPA - Vertical Pod Autoscaler"]
    A --> D["CA - Cluster Autoscaler"]
    B --> E["Adds MORE pods"]
    C --> F["Makes pods BIGGER"]
    D --> G["Adds MORE machines"]

Hero	What It Does	Restaurant Analogy
HPA	Adds more pods	Hire more waiters
VPA	Makes pods stronger	Give waiters bigger trays
CA	Adds more machines	Build more restaurant space

1. Horizontal Pod Autoscaler (HPA)

The “Hire More Waiters” Solution

Simple Explanation: When your app gets busy, HPA creates MORE copies (pods) of your app. When things calm down, it removes the extra copies.

How Does HPA Know When to Act?

HPA watches your pods like a supervisor watching employees:

“Is the waiter (CPU) working too hard?”
“Are too many orders (memory) piling up?”

Example: If CPU usage goes above 70%, add more pods!

Real YAML Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

What This Says:

Keep at least 2 pods running (never less)
Never go above 10 pods (save money!)
When average CPU hits 70%, add more pods

The Magic Formula

graph LR
    A["Current CPU: 140%"] --> B["Target CPU: 70%"]
    B --> C["Need: 140/70 = 2x pods"]
    C --> D["If 3 pods now → Scale to 6"]

Simple Math:

You have 3 pods at 140% CPU total
Target is 70% per pod
HPA calculates: 140 ÷ 70 = 2
New pods needed: 3 × 2 = 6 pods

2. Vertical Pod Autoscaler (VPA)

The “Bigger Tray” Solution

Instead of hiring more waiters, what if each waiter could carry a BIGGER tray?

VPA makes individual pods stronger by giving them more CPU and memory.

When to Use VPA?

Your app can’t run in multiple copies
You don’t know how much CPU/memory to request
Your pods keep getting killed for using too much memory

VPA Modes

Mode	What Happens	Good For
Off	Just recommends, no changes	Learning what your app needs
Auto	Restarts pods with new limits	Production apps
Initial	Sets resources only at start	Batch jobs

Real YAML Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1000m
        memory: 500Mi

What This Says:

Watch “my-app” deployment
Automatically adjust resources
CPU can range from 100m to 1000m
Memory can range from 50Mi to 500Mi

VPA’s Four Components

graph TD
    A["VPA System"] --> B["Recommender"]
    A --> C["Updater"]
    A --> D["Admission Controller"]
    B --> E["Watches pods &amp; suggests sizes"]
    C --> F["Evicts pods that need resizing"]
    D --> G["Sets resources on new pods"]

3. Cluster Autoscaler (CA)

The “Build More Restaurant Space” Solution

What if you have lots of waiters ready, but no space for them to work?

Cluster Autoscaler adds or removes entire machines (nodes) from your cluster!

When Does CA Add Nodes?

CA watches for pending pods - pods that want to run but can’t find a home:

graph TD
    A["Pod wants to run"] --> B{Enough space on nodes?}
    B -->|Yes| C["Pod runs happily"]
    B -->|No| D["Pod is PENDING"]
    D --> E["CA notices pending pod"]
    E --> F["CA adds new node"]
    F --> C

Real-World Example

Your cluster has 3 nodes, each with 4GB memory. A new deployment needs 15GB total.

3 nodes × 4GB = 12GB available
You need 15GB
3GB worth of pods are pending
CA adds 1 more node
Now pods can run!

4. Cluster Autoscaler Configuration

Teaching CA How to Behave

CA needs rules to follow. Here’s how you configure it:

Key Configuration Options

# Common CA flags
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--scan-interval=10s
--max-nodes-total=100
--cores-total=0:320
--memory-total=0:640Gi

Setting	What It Does	Example
`scan-interval`	How often CA checks	Every 10 seconds
`scale-down-delay-after-add`	Wait time after adding node	10 minutes
`scale-down-unneeded-time`	How long node must be idle	10 minutes
`max-nodes-total`	Maximum cluster size	100 nodes

Node Groups

CA manages node groups (pools of similar machines):

graph TD
    A["Cluster Autoscaler"] --> B["Node Group: Small"]
    A --> C["Node Group: Large"]
    A --> D["Node Group: GPU"]
    B --> E["2-10 nodes, 2CPU/4GB each"]
    C --> F["1-5 nodes, 8CPU/32GB each"]
    D --> G["0-3 nodes, with GPUs"]

Cloud Provider Example (AWS)

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority
data:
  priorities: |-
    10:
      - .*small.*
    50:
      - .*large.*

5. Autoscaler Scale-Down

The “Sending Waiters Home” Logic

When business is slow, you don’t need all those extra waiters. CA figures out when to scale down (remove nodes).

When Does Scale-Down Happen?

A node can be removed when:

Utilization is low (< 50% CPU/memory)
All pods can move to other nodes
No blocking conditions exist

What Blocks Scale-Down?

graph TD
    A["Can this node be removed?"] --> B{Pod with local storage?}
    B -->|Yes| C["❌ BLOCKED"]
    B -->|No| D{Pod without controller?}
    D -->|Yes| C
    D -->|No| E{PodDisruptionBudget violated?}
    E -->|Yes| C
    E -->|No| F{System pod?}
    F -->|Yes| C
    F -->|No| G["✅ Safe to remove!"]

Common Blockers:

Blocker	Why It Blocks	Solution
Local storage	Data would be lost	Use persistent volumes
No controller	Pod won’t restart elsewhere	Add deployment/replicaset
System pods	Cluster needs them	Use `cluster-autoscaler.kubernetes.io/safe-to-evict` annotation
PDB	Would break availability rules	Adjust PodDisruptionBudget

Safe-to-Evict Annotation

Tell CA “this pod is okay to evict”:

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

Scale-Down Timeline

graph LR
    A["Node underutilized"] --> B["Wait 10min"]
    B --> C["Still underutilized?"]
    C -->|Yes| D["Check if safe"]
    D --> E["Drain &amp; remove node"]
    C -->|No| A

6. HPA vs VPA: When to Use Which?

The Ultimate Comparison

Think of it like this:

HPA = More workers
VPA = Stronger workers

graph TD
    A["Need more capacity?"] --> B{Can your app run as multiple copies?}
    B -->|Yes| C["Use HPA!"]
    B -->|No| D["Use VPA!"]
    C --> E{Need both scaling AND right-sizing?}
    D --> F["VPA adjusts pod size"]
    E -->|Yes| G["Use HPA + VPA together - carefully!"]

Side-by-Side Comparison

Feature	HPA	VPA
Scales what?	Number of pods	Size of pods
Best for	Stateless apps	Stateful apps, right-sizing
Reaction speed	Fast (seconds)	Slower (requires restart)
Disruption	None (adds pods)	Restarts pods
Works with replicas > 1?	Yes (designed for it)	Yes, but requires restarts

Can You Use Both Together?

Yes, but carefully!

HPA should scale on custom metrics (requests/sec)
VPA handles CPU/memory sizing
Never let both control the same metric

Decision Flowchart

graph TD
    A["Your Situation"] --> B{Stateless app with variable traffic?}
    B -->|Yes| C["HPA is your friend"]
    B -->|No| D{Don't know right resource size?}
    D -->|Yes| E[Use VPA in Recommend mode first]
    D -->|No| F{App can't scale horizontally?}
    F -->|Yes| G["VPA is your only option"]
    F -->|No| H["Consider HPA for flexibility"]

Real-World Scenarios

Scenario 1: Web Server

Traffic spikes during sales
Each pod handles independent requests
Winner: HPA - add more pods during spikes

Scenario 2: Database

Can’t run multiple masters
Memory needs grow with data
Winner: VPA - increase pod resources

Scenario 3: API Service

Variable traffic + unknown resource needs
Winner: HPA + VPA - HPA for traffic, VPA for sizing

Quick Reference Summary

Autoscaler	What It Scales	When It Helps
HPA	Pod count	Traffic spikes, parallel workloads
VPA	Pod resources	Right-sizing, memory-hungry apps
Cluster Autoscaler	Node count	No room for pending pods

The Complete Picture

graph TD
    A["Traffic Increases"] --> B["HPA adds pods"]
    B --> C{Nodes have space?}
    C -->|No| D["CA adds nodes"]
    C -->|Yes| E["Pods run"]
    D --> E
    F["Traffic Decreases"] --> G["HPA removes pods"]
    G --> H["Nodes underutilized"]
    H --> I["CA removes nodes"]

Key Takeaways

HPA = More pods for more traffic (horizontal growth)
VPA = Bigger pods for bigger needs (vertical growth)
CA = More nodes when cluster is full
Scale-down has safety checks to protect your apps
Use HPA for stateless, VPA for stateful/unknown sizing
They can work together - just don’t overlap on the same metrics!

You now understand how Kubernetes automatically adjusts your infrastructure like a self-managing restaurant that always has the right number of waiters, the right tray sizes, and the right amount of space!

Autoscaling

Unable to load concept

Coming Soon...

Kubernetes Autoscaling: Teaching Your Cluster to Grow and Shrink Like Magic

The Story of the Smart Restaurant

Meet the Three Autoscaling Heroes

1. Horizontal Pod Autoscaler (HPA)

The “Hire More Waiters” Solution

How Does HPA Know When to Act?

Real YAML Example

The Magic Formula

2. Vertical Pod Autoscaler (VPA)

The “Bigger Tray” Solution

When to Use VPA?

VPA Modes

Real YAML Example

VPA’s Four Components

3. Cluster Autoscaler (CA)

The “Build More Restaurant Space” Solution

When Does CA Add Nodes?

Real-World Example

4. Cluster Autoscaler Configuration

Teaching CA How to Behave

Key Configuration Options

Node Groups

Cloud Provider Example (AWS)

5. Autoscaler Scale-Down

The “Sending Waiters Home” Logic

When Does Scale-Down Happen?

What Blocks Scale-Down?

Safe-to-Evict Annotation

Scale-Down Timeline

6. HPA vs VPA: When to Use Which?

The Ultimate Comparison

Side-by-Side Comparison

Can You Use Both Together?

Decision Flowchart

Real-World Scenarios

Quick Reference Summary

The Complete Picture

Key Takeaways

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue