What is database sharding?

Sharding divides data across multiple machines in horizontal scaling. Each shard is an independent database holding part of the total data.

What's the difference between vertical and horizontal scaling?

Vertical scaling makes one machine more powerful (bigger). Horizontal scaling adds more machines to share the workload.

What is a hot spot in database sharding?

A hot spot occurs when one shard gets way more traffic than others. It's caused by poor shard key choices like using timestamps alone.

Scaling and Sharding | NoSQL Database Guide

🏗️ Scaling & Sharding: The Pizza Restaurant Story

Imagine you own the world’s most popular pizza restaurant. One day, the line stretches around the block. How do you serve everyone? This story will teach you how databases handle the same challenge!

🍕 The Big Picture: Why Do Databases Need to “Scale”?

Think about your pizza restaurant on day one. One chef, one oven, 10 customers. Easy!

But suddenly, you’re famous. 10,000 customers show up. That single oven can’t cook fast enough. You have two choices:

Buy a BIGGER oven (Vertical Scaling)
Buy MORE ovens (Horizontal Scaling)

Databases face the exact same problem. Let’s explore both paths!

📏 Vertical Scaling: The “Super Oven” Approach

What Is It?

Vertical scaling means making your one machine more powerful.

Simple Example:

Your computer has 8GB RAM → upgrade to 64GB RAM
Your server has 4 CPU cores → upgrade to 32 cores
Your hard drive is 500GB → upgrade to 4TB SSD

It’s like replacing your home oven with a massive industrial oven.

Real Life:

Before: Small server (8GB RAM, 2 cores)
         ↓
After:  Monster server (256GB RAM, 64 cores)

✅ Pros:

Simple! No code changes needed
Everything stays in one place
Easy to manage

❌ Cons:

There’s a ceiling. The biggest server in the world still has limits
Expensive! Doubling power often costs 4x the price
Single point of failure. If it dies, everything dies

graph TD
    A["Small Server"] -->|Add RAM/CPU| B["Big Server"]
    B -->|Add More!| C["GIANT Server"]
    C -->|Hit Ceiling| D[💥 Can't grow anymore!]

🔄 Horizontal Scaling: The “Many Ovens” Approach

What Is It?

Horizontal scaling means adding more machines instead of bigger ones.

Simple Example:

Instead of 1 super-oven…
…you get 10 regular ovens, each cooking different pizzas!

Real Life:

Before: 1 server handling 10,000 requests

After:  Server 1 → handles 2,500 requests
        Server 2 → handles 2,500 requests
        Server 3 → handles 2,500 requests
        Server 4 → handles 2,500 requests

✅ Pros:

No ceiling! Need more power? Add more machines
Cheaper at large scale
Fault tolerant. One machine dies? Others keep working!

❌ Cons:

More complex to manage
Data needs to be split across machines
Requires smart planning

graph TD
    A["Too Much Traffic!"] --> B["Add Server 2"]
    B --> C["Add Server 3"]
    C --> D["Add Server 4"]
    D --> E["∞ Keep Adding!"]

🧩 Sharding: The Art of Splitting Data

What Is It?

Sharding is how we DIVIDE data across multiple machines in horizontal scaling.

Think of it like this: Your pizza restaurant has 3 kitchens now.

Kitchen A: Makes pepperoni pizzas
Kitchen B: Makes veggie pizzas
Kitchen C: Makes specialty pizzas

Each kitchen (shard) handles a slice of all orders!

The Database Version:

BEFORE (one database):
┌─────────────────────────┐
│  All 10 million users   │
└─────────────────────────┘

AFTER (3 shards):
┌─────────┐ ┌─────────┐ ┌─────────┐
│Shard A  │ │Shard B  │ │Shard C  │
│Users    │ │Users    │ │Users    │
│A-H      │ │I-P      │ │Q-Z      │
└─────────┘ └─────────┘ └─────────┘

Each shard is an independent database holding part of the data.

🔑 Shard Key Selection: The Most Important Decision

What Is It?

The shard key is the field you use to decide WHERE data goes.

Think of it like a sorting hat! When new data arrives, the shard key decides which shard gets it.

Example: User Database

Possible Shard Key	What Happens
`user_id`	User 1-1000 → Shard A, 1001-2000 → Shard B
`country`	USA → Shard A, Europe → Shard B, Asia → Shard C
`first_letter`	Names A-H → Shard A, I-P → Shard B, Q-Z → Shard C

⚠️ WARNING: Choose Wisely!

A bad shard key = disaster. A good shard key = smooth sailing.

Good Shard Key Traits:

✅ High cardinality (many unique values)
✅ Evenly distributed data
✅ Commonly used in queries
✅ Doesn’t change often

Bad Shard Key Example: Using country when 80% of users are from USA → Shard A is overloaded!

#️⃣ Hash-Based Sharding: The Random Distributor

What Is It?

Hash-based sharding runs each shard key through a magic math formula (hash function) to decide the shard.

Simple Example:

hash("Alice") → 7 → 7 mod 3 = 1 → Shard B
hash("Bob")   → 2 → 2 mod 3 = 2 → Shard C
hash("Carol") → 9 → 9 mod 3 = 0 → Shard A

How It Works:

graph TD
    A["New Data: user_id = 12345"] --> B["Hash Function"]
    B --> C["Result: 8"]
    C --> D["8 mod 3 = 2"]
    D --> E["Goes to Shard C!"]

✅ Pros:

Super even distribution! Data spreads evenly across shards
Great for random access patterns
Simple to implement

❌ Cons:

Range queries are hard. Finding “all users 1000-2000” means checking ALL shards
Adding shards requires reshuffling data

📊 Range-Based Sharding: The Organized Librarian

What Is It?

Range-based sharding assigns data to shards based on value ranges.

Simple Example:

Shard A: user_id 1 - 10,000
Shard B: user_id 10,001 - 20,000
Shard C: user_id 20,001 - 30,000

It’s like organizing books by page number ranges in different rooms!

How It Works:

graph TD
    A["Query: Get users 5000-7000"] --> B{Which shard?}
    B --> C["All in Shard A!"]
    C --> D["Fast! Only 1 shard needed"]

✅ Pros:

Range queries are FAST! Only hit relevant shards
Easy to understand and implement
Good for time-series data (January → Shard A, February → Shard B)

❌ Cons:

Uneven distribution risk. New data often clusters in one shard
Can create hot spots (more on this soon!)

⚖️ Shard Balancing: Keeping Things Fair

What Is It?

Over time, some shards get more data than others. Shard balancing moves data around to keep things even.

Pizza Restaurant Analogy: Kitchen A has 500 orders, Kitchen B has just 50. You move 200 orders from A to B to balance the workload!

How It Works:

BEFORE (Unbalanced):
┌─────────────┐ ┌───────┐ ┌───────┐
│ Shard A     │ │Shard B│ │Shard C│
│ 5 million   │ │500K   │ │500K   │
│ records     │ │records│ │records│
└─────────────┘ └───────┘ └───────┘

AFTER (Balanced):
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard A │ │ Shard B │ │ Shard C │
│ 2M      │ │ 2M      │ │ 2M      │
│ records │ │ records │ │ records │
└─────────┘ └─────────┘ └─────────┘

⚠️ The Challenge:

Balancing while the database is LIVE is like changing car tires while driving!

Solutions:

Automatic rebalancing (database does it)
Manual migration (you schedule it)
Chunk-based (move small chunks at a time)

🔥 Hot Spots: The Nightmare Scenario

What Is It?

A hot spot is when one shard gets WAY more traffic than others.

Think of it like this:

Kitchen A (celebrities) → 10,000 orders
Kitchen B → 100 orders
Kitchen C → 100 orders

Kitchen A is ON FIRE 🔥 while others are bored!

Real-World Examples:

Cause	What Happens
Celebrity user	Taylor Swift’s profile shard crashes
Time-based key	“Today’s date” shard overloaded
Popular product	iPhone launch day shard explodes

How to Prevent Hot Spots:

Choose better shard keys (avoid timestamps alone!)
Compound keys (user_id + timestamp instead of just timestamp)
Salt/prefix keys to spread traffic
Monitor and rebalance proactively

graph TD
    A["Hot Spot Detected!"] --> B{Solutions}
    B --> C["Better Shard Key"]
    B --> D["Add Salting"]
    B --> E["Rebalance Data"]
    B --> F["Add More Shards"]

Hot Spot Prevention Example:

BAD: shard_key = date
(All today's data → one shard!)

GOOD: shard_key = date + random_prefix
(Today's data → spread across shards!)

🎯 Putting It All Together

Let’s see the complete picture:

graph TD
    A["Growing Database"] --> B{Scale How?}
    B -->|Vertical| C["Bigger Server"]
    B -->|Horizontal| D["More Servers"]
    D --> E["Need Sharding!"]
    E --> F{Shard Strategy?}
    F -->|Even Distribution| G["Hash-Based"]
    F -->|Range Queries| H["Range-Based"]
    G --> I["Monitor for Hot Spots"]
    H --> I
    I --> J["Rebalance When Needed"]

💡 Key Takeaways

Concept	One-Liner
Vertical Scaling	Make one machine BIGGER
Horizontal Scaling	Add MORE machines
Sharding	Split data across machines
Shard Key	The “sorting hat” deciding where data lives
Hash-Based	Math formula = even distribution
Range-Based	Value ranges = fast range queries
Shard Balancing	Move data to keep shards equal
Hot Spots	When one shard gets all the traffic

🎉 You Did It!

You now understand how databases handle millions of users without breaking a sweat. Whether it’s Netflix, Amazon, or the next big app—they all use these exact concepts!

Remember: Scaling is like running a restaurant chain. Start small, plan ahead, and always keep things balanced!

🍕 “The best database architecture is like the best pizza: well-distributed, never burned, and scales to feed the world!”

Blimto

Scaling and Sharding

Unable to load concept

Coming Soon...

🏗️ Scaling & Sharding: The Pizza Restaurant Story

🍕 The Big Picture: Why Do Databases Need to “Scale”?

📏 Vertical Scaling: The “Super Oven” Approach

What Is It?

Real Life:

✅ Pros:

❌ Cons:

🔄 Horizontal Scaling: The “Many Ovens” Approach

What Is It?

Real Life:

✅ Pros:

❌ Cons:

🧩 Sharding: The Art of Splitting Data

What Is It?

The Database Version:

🔑 Shard Key Selection: The Most Important Decision

What Is It?

Example: User Database

⚠️ WARNING: Choose Wisely!

#️⃣ Hash-Based Sharding: The Random Distributor

What Is It?

How It Works:

✅ Pros:

❌ Cons:

📊 Range-Based Sharding: The Organized Librarian

What Is It?

How It Works:

✅ Pros:

❌ Cons:

⚖️ Shard Balancing: Keeping Things Fair

What Is It?

How It Works:

⚠️ The Challenge:

🔥 Hot Spots: The Nightmare Scenario

What Is It?

Real-World Examples:

How to Prevent Hot Spots:

Hot Spot Prevention Example:

🎯 Putting It All Together

💡 Key Takeaways

🎉 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactives - Premium Content

Interactives - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcards - Premium Content

Flashcards - Premium Content

Stay Tuned!

Sign in Required

Report an Issue