📦 NoSQL Collection Organization
Your Data’s Home: Collection Design, Capped Collections & Views
🏠 The Big Picture: What Are Collections?
Imagine your bedroom. You have different containers to organize your stuff:
- A toy box for toys
- A bookshelf for books
- A drawer for clothes
In NoSQL databases (like MongoDB), collections are like these containers. Each collection holds documents (your data items) that belong together.
But here’s the magical part: Unlike a toy box with a fixed shape, collections in NoSQL are flexible—you can put different shaped toys in the same box!
📐 Collection Design
What Is Collection Design?
Collection design is deciding how to organize your data containers. It’s like deciding:
- Should toys and books go in the same box?
- Or should they have separate homes?
The Golden Rule: Group Related Things Together
graph TD A["Your Data"] --> B["Users Collection"] A --> C["Products Collection"] A --> D["Orders Collection"] B --> B1["User 1"] B --> B2["User 2"] C --> C1["Product A"] C --> C2["Product B"]
Two Main Approaches
1. Embedding (Putting Things Inside)
Like putting a letter INSIDE an envelope:
{
"name": "Sarah",
"address": {
"street": "123 Main St",
"city": "Boston"
}
}
When to embed:
- Data is always accessed together
- Child data is unique to parent
- Small amounts of related data
2. Referencing (Pointing to Another Place)
Like writing “See toy box #5” instead of putting the toy here:
{
"name": "Sarah",
"address_id": "addr_12345"
}
When to reference:
- Data is shared across documents
- Data changes frequently
- Large amounts of related data
Design Questions to Ask
| Question | If YES → | If NO → |
|---|---|---|
| Do I always need this data together? | Embed | Reference |
| Does this data belong to ONE parent? | Embed | Reference |
| Is the data small and rarely changes? | Embed | Reference |
Real Example: Blog Posts
Option A - Embedded Comments:
{
"title": "My First Post",
"comments": [
{"user": "Tom", "text": "Great!"},
{"user": "Amy", "text": "Love it!"}
]
}
Option B - Referenced Comments:
{
"title": "My First Post",
"comment_ids": ["c1", "c2", "c3"]
}
Which is better? It depends!
- Few comments that you always show? → Embed
- Thousands of comments with pagination? → Reference
🎪 Capped Collections
What Are Capped Collections?
Imagine a circular toy train track. The train goes round and round. When you add a new train car at the front, the oldest one at the back gets pushed off!
Capped collections work exactly like this:
- Fixed size (you decide: “only 1000 documents” or “only 5MB”)
- Oldest documents automatically deleted when full
- New documents always go at the end
- SUPER fast for inserting and reading in order!
graph LR subgraph Capped Collection A["Newest"] --> B["New"] --> C["Old"] --> D["Oldest"] end E["New Doc"] -.-> A D -.-> F["Auto Deleted"]
Creating a Capped Collection
db.createCollection("logs", {
capped: true,
size: 5242880, // 5 MB max
max: 5000 // 5000 documents max
})
Two limits work together:
size: Maximum bytes (required)max: Maximum document count (optional)
Whichever limit is hit first triggers deletion!
Perfect Use Cases
| Use Case | Why Capped? |
|---|---|
| 🔔 Recent notifications | Only show last 100 |
| 📊 Server logs | Keep last hour of logs |
| 💬 Chat messages | Recent history only |
| 📈 Sensor readings | Rolling window of data |
Capped Collection Rules
Can DO:
- Insert new documents ✅
- Read documents ✅
- Update documents (if size stays same) ✅
Cannot DO:
- Delete individual documents ❌
- Update to make document bigger ❌
- Shard the collection ❌
Example: Chat Room Messages
// Create capped collection for chat
db.createCollection("chat_room_1", {
capped: true,
size: 1048576, // 1 MB
max: 500 // Last 500 messages
})
// Insert a message
db.chat_room_1.insertOne({
user: "Alex",
message: "Hello everyone!",
timestamp: new Date()
})
// Get messages in order (super fast!)
db.chat_room_1.find().sort({$natural: 1})
👁️ Views
What Are Views?
Think of views as magic windows that show you a specific part of your data—filtered, transformed, or combined—without copying anything!
It’s like having a window in your room that ONLY shows red toys, even though your toy box has all colors.
graph TD A["Original Collection"] --> B["View: Active Users"] A --> C["View: Recent Orders"] A --> D["View: VIP Customers"] B --> B1["Shows only<br>active users"] C --> C1["Shows orders<br>from last 7 days"] D --> D1["Shows users<br>spending > $1000"]
Why Use Views?
| Benefit | Explanation |
|---|---|
| 🔒 Security | Show only safe columns to certain users |
| 🧹 Simplicity | Pre-filter complex queries |
| 💾 No duplication | Views don’t copy data |
| 🔄 Always fresh | Views show current data |
Creating a View
Scenario: You have a users collection but want a view showing only active users.
db.createView(
"active_users", // View name
"users", // Source collection
[ // Pipeline (filters)
{ $match: { status: "active" } },
{ $project: {
name: 1,
email: 1,
_id: 0
}
}
]
)
Now active_users acts like a collection:
db.active_users.find()
// Returns only active users!
View vs Collection
| Feature | Collection | View |
|---|---|---|
| Stores data | ✅ Yes | ❌ No |
| Uses disk space | ✅ Yes | ❌ No |
| Can insert/update | ✅ Yes | ❌ No |
| Real-time data | Manual update | ✅ Auto |
Real Examples
Example 1: Orders This Week
db.createView(
"recent_orders",
"orders",
[
{ $match: {
orderDate: {
$gte: new Date(Date.now() - 7*24*60*60*1000)
}
}
}
]
)
Example 2: Product Summary (Join)
db.createView(
"product_details",
"products",
[
{ $lookup: {
from: "categories",
localField: "category_id",
foreignField: "_id",
as: "category"
}
},
{ $unwind: "$category" },
{ $project: {
name: 1,
price: 1,
categoryName: "$category.name"
}
}
]
)
Modifying Views
Views can be modified but not directly edited:
// Drop and recreate
db.active_users.drop()
db.createView("active_users", "users", [...])
// Or use collMod
db.runCommand({
collMod: "active_users",
viewOn: "users",
pipeline: [...]
})
🎯 Quick Decision Guide
graph TD A["What do you need?"] --> B{Store actual data?} B -->|Yes| C{Fixed size buffer?} B -->|No| D["Use a VIEW"] C -->|Yes| E["Use CAPPED COLLECTION"] C -->|No| F["Use REGULAR COLLECTION"] F --> G{Related data?} G -->|Always together| H["EMBED"] G -->|Shared/Large| I["REFERENCE"]
🌟 Summary: Your Data Organization Toolbox
| Tool | Best For | Remember |
|---|---|---|
| Collection Design | Organizing all your data | Embed small, reference large |
| Capped Collections | Rolling/recent data | Auto-deletes oldest |
| Views | Filtered windows | No storage, always fresh |
You did it! Now you know how to:
- Design collections that make sense
- Use capped collections for logs and recent data
- Create views for filtered, secure access
Your data has a happy, organized home! 🏠✨
