Aggregation Pipeline

Back

Loading concept...

🏭 The Data Factory: Understanding MongoDB’s Aggregation Pipeline

Imagine you have a giant toy factory. Toys come in on one end. They go through different stations. Each station does ONE job. At the end, you get exactly what you want!

That’s exactly what an Aggregation Pipeline does with your data!


🤔 What is an Aggregation Pipeline?

Think of it like a water slide with many sections.

Your data (water) flows through. Each section (stage) changes it a little. At the bottom, you get your final result!

graph TD A["📦 Raw Data"] --> B["Stage 1: Filter"] B --> C["Stage 2: Group"] C --> D["Stage 3: Sort"] D --> E["✨ Final Result"]

Real Example:

  • You have 1000 toy orders
  • Stage 1: Keep only “robot” toys
  • Stage 2: Group by color
  • Stage 3: Sort by popularity
  • Result: A nice list of robot toys by color!

đź”§ Pipeline Stages: The Building Blocks

Each stage is like a worker at a station. They take data in, do ONE thing, and pass it along.

The Most Common Stages:

Stage What It Does Real Life Example
$match Filters data “Only show me red toys”
$group Groups similar things “Put all robots together”
$project Picks what to show “I only want name and price”
$sort Orders results “Show cheapest first”
$lookup Joins other collections “Add customer info to orders”

Important Rule: Data flows from one stage to the next. Like a river!


🎯 Match and Filter Operations

$match is like a security guard. It only lets certain documents through.

How It Works:

db.toys.aggregate([
  { $match: { color: "red" } }
])

This says: “Only let red toys through!”

More Filter Examples:

// Toys that cost less than $20
{ $match: { price: { $lt: 20 } } }

// Toys made in 2024
{ $match: { year: 2024 } }

// Red robots only
{ $match: {
    color: "red",
    type: "robot"
  }
}

Pro Tip: Put $match EARLY in your pipeline. It’s like removing trash before sorting. Less work for later stages!


👥 Group Operations

$group is like sorting your toys into boxes.

All the blue toys go in the blue box. All the red toys go in the red box.

The Magic _id Field:

db.toys.aggregate([
  { $group: {
      _id: "$color",
      count: { $sum: 1 }
    }
  }
])

This says: “Make one box for each color. Count how many in each box.”

Result:

{ "_id": "red", "count": 45 }
{ "_id": "blue", "count": 32 }
{ "_id": "green", "count": 28 }

Common Group Calculations:

Operator What It Does Example
$sum Adds numbers Total sales
$avg Finds average Average price
$min Finds smallest Cheapest item
$max Finds largest Most expensive
$count Counts items Number of orders

Bigger Example:

db.orders.aggregate([
  { $group: {
      _id: "$product",
      totalSold: { $sum: "$quantity" },
      avgPrice: { $avg: "$price" },
      minPrice: { $min: "$price" }
    }
  }
])

đź“‹ Project Operations

$project is like packing your backpack. You choose what to take!

Showing Fields:

db.toys.aggregate([
  { $project: {
      name: 1,
      price: 1,
      _id: 0
    }
  }
])
  • 1 means “yes, include this”
  • 0 means “no, hide this”

Creating New Fields:

db.toys.aggregate([
  { $project: {
      name: 1,
      salePrice: {
        $multiply: ["$price", 0.8]
      }
    }
  }
])

This creates a new salePrice that’s 80% of the original!

Renaming Fields:

{ $project: {
    toyName: "$name",
    cost: "$price"
  }
}

Now name becomes toyName and price becomes cost.


📊 Sort Operations

$sort puts things in order. Like lining up by height!

Basic Sorting:

db.toys.aggregate([
  { $sort: { price: 1 } }
])
  • 1 = Ascending (smallest to biggest, A to Z)
  • -1 = Descending (biggest to smallest, Z to A)

Multiple Sort Fields:

{ $sort: {
    category: 1,
    price: -1
  }
}

This says: “First sort by category A-Z. Within each category, show expensive ones first.”

Pro Tip:

graph TD A["Your Data"] --> B{$match first!} B --> C["Fewer documents"] C --> D["$sort is faster"] D --> E["Happy Results! 🎉"]

Sort AFTER filtering. Sorting 100 items is faster than sorting 10,000!


đź”— Lookup Operations

$lookup is like making a phone call to get more info.

You have an order. You want customer details. They’re in a different collection!

How Lookup Works:

db.orders.aggregate([
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customerInfo"
    }
  }
])

Breaking It Down:

  • from: The other collection to look in
  • localField: The field in YOUR document
  • foreignField: The matching field in OTHER collection
  • as: Name for the new array of results

Visual Example:

graph LR A["Order Document"] -->|customerId: 123| B["🔍 Lookup"] C["Customers Collection"] -->|_id: 123| B B --> D["Order + Customer Info!"]

Real Result:

Before Lookup:

{ "orderId": 1, "customerId": 123 }

After Lookup:

{
  "orderId": 1,
  "customerId": 123,
  "customerInfo": [{
    "_id": 123,
    "name": "Alice",
    "email": "alice@email.com"
  }]
}

🎭 Putting It All Together

Let’s build a complete pipeline!

Mission: Find the top 3 most ordered products this year with customer names.

db.orders.aggregate([
  // Stage 1: Only 2024 orders
  { $match: {
      year: 2024
    }
  },

  // Stage 2: Add customer info
  { $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customer"
    }
  },

  // Stage 3: Group by product
  { $group: {
      _id: "$product",
      totalOrders: { $sum: 1 },
      customers: { $addToSet: "$customer" }
    }
  },

  // Stage 4: Sort by most orders
  { $sort: { totalOrders: -1 } },

  // Stage 5: Show only top 3
  { $limit: 3 },

  // Stage 6: Clean up output
  { $project: {
      product: "$_id",
      totalOrders: 1,
      _id: 0
    }
  }
])

đź§  Quick Memory Tricks

Stage Remember It As
$match 🚪 Door Guard - who gets in?
$group 📦 Box Sorter - similar things together
$project 🎒 Backpack - what to carry?
$sort 📏 Line Up - in what order?
$lookup 📞 Phone Call - get more info

🎯 Key Takeaways

  1. Pipeline = Assembly Line: Data flows through stages
  2. Each Stage = One Job: Keep it simple
  3. Order Matters: Filter early, sort late
  4. $match First: Less data = faster pipeline
  5. $lookup = Join: Connect different collections

🚀 You’ve Got This!

The Aggregation Pipeline is like being a data chef.

You have ingredients (raw data). You chop ($match), mix ($group), arrange ($sort), and plate ($project).

The result? A beautiful dish of exactly the data you need!

Now go build some pipelines! 🎉

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.