Autoscaling

Back

Loading concept...

Kubernetes Autoscaling: Teaching Your Cluster to Grow and Shrink Like Magic


The Story of the Smart Restaurant

Imagine you own a restaurant. Some days, only 5 customers come. Other days, 500 people show up!

If you always have 50 waiters working, you’re wasting money on slow days. But if you only have 5 waiters, your customers wait forever on busy days.

What if your restaurant could magically hire more waiters when it gets busy and send them home when it’s quiet?

That’s exactly what Kubernetes Autoscaling does for your applications!


Meet the Three Autoscaling Heroes

Think of Kubernetes autoscaling like a team of three smart helpers:

graph TD A["🎯 Your App Needs Help!"] --> B["HPA - Horizontal Pod Autoscaler"] A --> C["VPA - Vertical Pod Autoscaler"] A --> D["CA - Cluster Autoscaler"] B --> E["Adds MORE pods"] C --> F["Makes pods BIGGER"] D --> G["Adds MORE machines"]
Hero What It Does Restaurant Analogy
HPA Adds more pods Hire more waiters
VPA Makes pods stronger Give waiters bigger trays
CA Adds more machines Build more restaurant space

1. Horizontal Pod Autoscaler (HPA)

The “Hire More Waiters” Solution

Simple Explanation: When your app gets busy, HPA creates MORE copies (pods) of your app. When things calm down, it removes the extra copies.

How Does HPA Know When to Act?

HPA watches your pods like a supervisor watching employees:

  • “Is the waiter (CPU) working too hard?”
  • “Are too many orders (memory) piling up?”

Example: If CPU usage goes above 70%, add more pods!

Real YAML Example

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

What This Says:

  • Keep at least 2 pods running (never less)
  • Never go above 10 pods (save money!)
  • When average CPU hits 70%, add more pods

The Magic Formula

graph LR A["Current CPU: 140%"] --> B["Target CPU: 70%"] B --> C["Need: 140/70 = 2x pods"] C --> D["If 3 pods now → Scale to 6"]

Simple Math:

  • You have 3 pods at 140% CPU total
  • Target is 70% per pod
  • HPA calculates: 140 ÷ 70 = 2
  • New pods needed: 3 × 2 = 6 pods

2. Vertical Pod Autoscaler (VPA)

The “Bigger Tray” Solution

Instead of hiring more waiters, what if each waiter could carry a BIGGER tray?

VPA makes individual pods stronger by giving them more CPU and memory.

When to Use VPA?

  • Your app can’t run in multiple copies
  • You don’t know how much CPU/memory to request
  • Your pods keep getting killed for using too much memory

VPA Modes

Mode What Happens Good For
Off Just recommends, no changes Learning what your app needs
Auto Restarts pods with new limits Production apps
Initial Sets resources only at start Batch jobs

Real YAML Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      minAllowed:
        cpu: 100m
        memory: 50Mi
      maxAllowed:
        cpu: 1000m
        memory: 500Mi

What This Says:

  • Watch “my-app” deployment
  • Automatically adjust resources
  • CPU can range from 100m to 1000m
  • Memory can range from 50Mi to 500Mi

VPA’s Four Components

graph TD A["VPA System"] --> B["Recommender"] A --> C["Updater"] A --> D["Admission Controller"] B --> E["Watches pods & suggests sizes"] C --> F["Evicts pods that need resizing"] D --> G["Sets resources on new pods"]

3. Cluster Autoscaler (CA)

The “Build More Restaurant Space” Solution

What if you have lots of waiters ready, but no space for them to work?

Cluster Autoscaler adds or removes entire machines (nodes) from your cluster!

When Does CA Add Nodes?

CA watches for pending pods - pods that want to run but can’t find a home:

graph TD A["Pod wants to run"] --> B{Enough space on nodes?} B -->|Yes| C["Pod runs happily"] B -->|No| D["Pod is PENDING"] D --> E["CA notices pending pod"] E --> F["CA adds new node"] F --> C

Real-World Example

Your cluster has 3 nodes, each with 4GB memory. A new deployment needs 15GB total.

  • 3 nodes × 4GB = 12GB available
  • You need 15GB
  • 3GB worth of pods are pending
  • CA adds 1 more node
  • Now pods can run!

4. Cluster Autoscaler Configuration

Teaching CA How to Behave

CA needs rules to follow. Here’s how you configure it:

Key Configuration Options

# Common CA flags
--scale-down-enabled=true
--scale-down-delay-after-add=10m
--scale-down-unneeded-time=10m
--scan-interval=10s
--max-nodes-total=100
--cores-total=0:320
--memory-total=0:640Gi
Setting What It Does Example
scan-interval How often CA checks Every 10 seconds
scale-down-delay-after-add Wait time after adding node 10 minutes
scale-down-unneeded-time How long node must be idle 10 minutes
max-nodes-total Maximum cluster size 100 nodes

Node Groups

CA manages node groups (pools of similar machines):

graph TD A["Cluster Autoscaler"] --> B["Node Group: Small"] A --> C["Node Group: Large"] A --> D["Node Group: GPU"] B --> E["2-10 nodes, 2CPU/4GB each"] C --> F["1-5 nodes, 8CPU/32GB each"] D --> G["0-3 nodes, with GPUs"]

Cloud Provider Example (AWS)

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority
data:
  priorities: |-
    10:
      - .*small.*
    50:
      - .*large.*

5. Autoscaler Scale-Down

The “Sending Waiters Home” Logic

When business is slow, you don’t need all those extra waiters. CA figures out when to scale down (remove nodes).

When Does Scale-Down Happen?

A node can be removed when:

  1. Utilization is low (< 50% CPU/memory)
  2. All pods can move to other nodes
  3. No blocking conditions exist

What Blocks Scale-Down?

graph TD A["Can this node be removed?"] --> B{Pod with local storage?} B -->|Yes| C["❌ BLOCKED"] B -->|No| D{Pod without controller?} D -->|Yes| C D -->|No| E{PodDisruptionBudget violated?} E -->|Yes| C E -->|No| F{System pod?} F -->|Yes| C F -->|No| G["✅ Safe to remove!"]

Common Blockers:

Blocker Why It Blocks Solution
Local storage Data would be lost Use persistent volumes
No controller Pod won’t restart elsewhere Add deployment/replicaset
System pods Cluster needs them Use cluster-autoscaler.kubernetes.io/safe-to-evict annotation
PDB Would break availability rules Adjust PodDisruptionBudget

Safe-to-Evict Annotation

Tell CA “this pod is okay to evict”:

metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

Scale-Down Timeline

graph LR A["Node underutilized"] --> B["Wait 10min"] B --> C["Still underutilized?"] C -->|Yes| D["Check if safe"] D --> E["Drain &amp; remove node"] C -->|No| A

6. HPA vs VPA: When to Use Which?

The Ultimate Comparison

Think of it like this:

  • HPA = More workers
  • VPA = Stronger workers
graph TD A["Need more capacity?"] --> B{Can your app run as multiple copies?} B -->|Yes| C["Use HPA!"] B -->|No| D["Use VPA!"] C --> E{Need both scaling AND right-sizing?} D --> F["VPA adjusts pod size"] E -->|Yes| G["Use HPA + VPA together - carefully!"]

Side-by-Side Comparison

Feature HPA VPA
Scales what? Number of pods Size of pods
Best for Stateless apps Stateful apps, right-sizing
Reaction speed Fast (seconds) Slower (requires restart)
Disruption None (adds pods) Restarts pods
Works with replicas > 1? Yes (designed for it) Yes, but requires restarts

Can You Use Both Together?

Yes, but carefully!

  • HPA should scale on custom metrics (requests/sec)
  • VPA handles CPU/memory sizing
  • Never let both control the same metric

Decision Flowchart

graph TD A["Your Situation"] --> B{Stateless app with variable traffic?} B -->|Yes| C["HPA is your friend"] B -->|No| D{Don't know right resource size?} D -->|Yes| E[Use VPA in Recommend mode first] D -->|No| F{App can't scale horizontally?} F -->|Yes| G["VPA is your only option"] F -->|No| H["Consider HPA for flexibility"]

Real-World Scenarios

Scenario 1: Web Server

  • Traffic spikes during sales
  • Each pod handles independent requests
  • Winner: HPA - add more pods during spikes

Scenario 2: Database

  • Can’t run multiple masters
  • Memory needs grow with data
  • Winner: VPA - increase pod resources

Scenario 3: API Service

  • Variable traffic + unknown resource needs
  • Winner: HPA + VPA - HPA for traffic, VPA for sizing

Quick Reference Summary

Autoscaler What It Scales When It Helps
HPA Pod count Traffic spikes, parallel workloads
VPA Pod resources Right-sizing, memory-hungry apps
Cluster Autoscaler Node count No room for pending pods

The Complete Picture

graph TD A["Traffic Increases"] --> B["HPA adds pods"] B --> C{Nodes have space?} C -->|No| D["CA adds nodes"] C -->|Yes| E["Pods run"] D --> E F["Traffic Decreases"] --> G["HPA removes pods"] G --> H["Nodes underutilized"] H --> I["CA removes nodes"]

Key Takeaways

  1. HPA = More pods for more traffic (horizontal growth)
  2. VPA = Bigger pods for bigger needs (vertical growth)
  3. CA = More nodes when cluster is full
  4. Scale-down has safety checks to protect your apps
  5. Use HPA for stateless, VPA for stateful/unknown sizing
  6. They can work together - just don’t overlap on the same metrics!

You now understand how Kubernetes automatically adjusts your infrastructure like a self-managing restaurant that always has the right number of waiters, the right tray sizes, and the right amount of space!

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.