Collection Organization

Back

Loading concept...

📦 NoSQL Collection Organization

Your Data’s Home: Collection Design, Capped Collections & Views


🏠 The Big Picture: What Are Collections?

Imagine your bedroom. You have different containers to organize your stuff:

  • A toy box for toys
  • A bookshelf for books
  • A drawer for clothes

In NoSQL databases (like MongoDB), collections are like these containers. Each collection holds documents (your data items) that belong together.

But here’s the magical part: Unlike a toy box with a fixed shape, collections in NoSQL are flexible—you can put different shaped toys in the same box!


📐 Collection Design

What Is Collection Design?

Collection design is deciding how to organize your data containers. It’s like deciding:

  • Should toys and books go in the same box?
  • Or should they have separate homes?

The Golden Rule: Group Related Things Together

graph TD A["Your Data"] --> B["Users Collection"] A --> C["Products Collection"] A --> D["Orders Collection"] B --> B1["User 1"] B --> B2["User 2"] C --> C1["Product A"] C --> C2["Product B"]

Two Main Approaches

1. Embedding (Putting Things Inside)

Like putting a letter INSIDE an envelope:

{
  "name": "Sarah",
  "address": {
    "street": "123 Main St",
    "city": "Boston"
  }
}

When to embed:

  • Data is always accessed together
  • Child data is unique to parent
  • Small amounts of related data

2. Referencing (Pointing to Another Place)

Like writing “See toy box #5” instead of putting the toy here:

{
  "name": "Sarah",
  "address_id": "addr_12345"
}

When to reference:

  • Data is shared across documents
  • Data changes frequently
  • Large amounts of related data

Design Questions to Ask

Question If YES → If NO →
Do I always need this data together? Embed Reference
Does this data belong to ONE parent? Embed Reference
Is the data small and rarely changes? Embed Reference

Real Example: Blog Posts

Option A - Embedded Comments:

{
  "title": "My First Post",
  "comments": [
    {"user": "Tom", "text": "Great!"},
    {"user": "Amy", "text": "Love it!"}
  ]
}

Option B - Referenced Comments:

{
  "title": "My First Post",
  "comment_ids": ["c1", "c2", "c3"]
}

Which is better? It depends!

  • Few comments that you always show? → Embed
  • Thousands of comments with pagination? → Reference

🎪 Capped Collections

What Are Capped Collections?

Imagine a circular toy train track. The train goes round and round. When you add a new train car at the front, the oldest one at the back gets pushed off!

Capped collections work exactly like this:

  • Fixed size (you decide: “only 1000 documents” or “only 5MB”)
  • Oldest documents automatically deleted when full
  • New documents always go at the end
  • SUPER fast for inserting and reading in order!
graph LR subgraph Capped Collection A["Newest"] --> B["New"] --> C["Old"] --> D["Oldest"] end E["New Doc"] -.-> A D -.-> F["Auto Deleted"]

Creating a Capped Collection

db.createCollection("logs", {
  capped: true,
  size: 5242880,    // 5 MB max
  max: 5000         // 5000 documents max
})

Two limits work together:

  • size: Maximum bytes (required)
  • max: Maximum document count (optional)

Whichever limit is hit first triggers deletion!

Perfect Use Cases

Use Case Why Capped?
🔔 Recent notifications Only show last 100
📊 Server logs Keep last hour of logs
💬 Chat messages Recent history only
📈 Sensor readings Rolling window of data

Capped Collection Rules

Can DO:

  • Insert new documents ✅
  • Read documents ✅
  • Update documents (if size stays same) ✅

Cannot DO:

  • Delete individual documents ❌
  • Update to make document bigger ❌
  • Shard the collection ❌

Example: Chat Room Messages

// Create capped collection for chat
db.createCollection("chat_room_1", {
  capped: true,
  size: 1048576,  // 1 MB
  max: 500        // Last 500 messages
})

// Insert a message
db.chat_room_1.insertOne({
  user: "Alex",
  message: "Hello everyone!",
  timestamp: new Date()
})

// Get messages in order (super fast!)
db.chat_room_1.find().sort({$natural: 1})

👁️ Views

What Are Views?

Think of views as magic windows that show you a specific part of your data—filtered, transformed, or combined—without copying anything!

It’s like having a window in your room that ONLY shows red toys, even though your toy box has all colors.

graph TD A["Original Collection"] --> B["View: Active Users"] A --> C["View: Recent Orders"] A --> D["View: VIP Customers"] B --> B1["Shows only<br>active users"] C --> C1["Shows orders<br>from last 7 days"] D --> D1["Shows users<br>spending > $1000"]

Why Use Views?

Benefit Explanation
🔒 Security Show only safe columns to certain users
🧹 Simplicity Pre-filter complex queries
💾 No duplication Views don’t copy data
🔄 Always fresh Views show current data

Creating a View

Scenario: You have a users collection but want a view showing only active users.

db.createView(
  "active_users",     // View name
  "users",            // Source collection
  [                   // Pipeline (filters)
    { $match: { status: "active" } },
    { $project: {
        name: 1,
        email: 1,
        _id: 0
      }
    }
  ]
)

Now active_users acts like a collection:

db.active_users.find()
// Returns only active users!

View vs Collection

Feature Collection View
Stores data ✅ Yes ❌ No
Uses disk space ✅ Yes ❌ No
Can insert/update ✅ Yes ❌ No
Real-time data Manual update ✅ Auto

Real Examples

Example 1: Orders This Week

db.createView(
  "recent_orders",
  "orders",
  [
    { $match: {
        orderDate: {
          $gte: new Date(Date.now() - 7*24*60*60*1000)
        }
      }
    }
  ]
)

Example 2: Product Summary (Join)

db.createView(
  "product_details",
  "products",
  [
    { $lookup: {
        from: "categories",
        localField: "category_id",
        foreignField: "_id",
        as: "category"
      }
    },
    { $unwind: "$category" },
    { $project: {
        name: 1,
        price: 1,
        categoryName: "$category.name"
      }
    }
  ]
)

Modifying Views

Views can be modified but not directly edited:

// Drop and recreate
db.active_users.drop()
db.createView("active_users", "users", [...])

// Or use collMod
db.runCommand({
  collMod: "active_users",
  viewOn: "users",
  pipeline: [...]
})

🎯 Quick Decision Guide

graph TD A["What do you need?"] --> B{Store actual data?} B -->|Yes| C{Fixed size buffer?} B -->|No| D["Use a VIEW"] C -->|Yes| E["Use CAPPED COLLECTION"] C -->|No| F["Use REGULAR COLLECTION"] F --> G{Related data?} G -->|Always together| H["EMBED"] G -->|Shared/Large| I["REFERENCE"]

🌟 Summary: Your Data Organization Toolbox

Tool Best For Remember
Collection Design Organizing all your data Embed small, reference large
Capped Collections Rolling/recent data Auto-deletes oldest
Views Filtered windows No storage, always fresh

You did it! Now you know how to:

  • Design collections that make sense
  • Use capped collections for logs and recent data
  • Create views for filtered, secure access

Your data has a happy, organized home! 🏠✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.