๐งฑ Neural Network Layers: nn.Module Mastery
The LEGO Factory Analogy ๐ญ
Imagine youโre the owner of a magical LEGO factory. Each room in your factory does one special jobโsome rooms paint bricks, others snap them together, and some check if everything looks right. In PyTorch, nn.Module is like the blueprint for building these rooms. Every piece of your neural network is a room (a module), and together they create amazing things!
๐ฏ What is nn.Module?
Think of nn.Module as the parent class for everything in your neural network. Just like how all dogs are animals, all neural network pieces are Modules.
import torch.nn as nn
# Every layer inherits from nn.Module
class MyRoom(nn.Module):
def __init__(self):
super().__init__()
Why does this matter?
- PyTorch can find all your learnable weights automatically
- You get save/load for free
- Training mode switches work everywhere
๐ ๏ธ Creating Custom Modules
Letโs build our first factory room! A custom module is like designing your own LEGO brick.
import torch
import torch.nn as nn
class PaintRoom(nn.Module):
def __init__(self, in_colors, out_colors):
super().__init__()
# This is our painting machine
self.painter = nn.Linear(in_colors, out_colors)
def forward(self, brick):
# Paint the brick and return it
return self.painter(brick)
The Recipe:
- Inherit from
nn.Module - Call
super().__init__()first - Define your layers in
__init__ - Write the
forwardmethod
โก The Forward Method
The forward method is the conveyor belt of your factory room. When a brick comes in, what happens to it?
class MagicRoom(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(10, 20)
self.layer2 = nn.Linear(20, 5)
def forward(self, x):
# Step 1: First machine
x = self.layer1(x)
# Step 2: Add some magic (ReLU)
x = torch.relu(x)
# Step 3: Second machine
x = self.layer2(x)
return x
๐ฎ Pro Tip: Never call
forward()directly! Usemodel(input)instead. PyTorch does extra magic behind the scenes.
graph TD A[Input Brick] --> B[layer1] B --> C[ReLU Magic] C --> D[layer2] D --> E[Output Brick]
๐ Parameters and Buffers
Your factory has two types of important things:
Parameters (Learnable Weights) ๐
These are the knobs that PyTorch adjusts during training.
class SmartRoom(nn.Module):
def __init__(self):
super().__init__()
# This creates a learnable parameter
self.weight = nn.Parameter(
torch.randn(3, 3)
)
Buffers (Fixed Values) ๐ฆ
These are values you want to save but NOT train.
class SmartRoom(nn.Module):
def __init__(self):
super().__init__()
# This is saved but not trained
self.register_buffer(
'my_constant',
torch.tensor([1.0, 2.0, 3.0])
)
Quick Comparison:
| Type | Learnable? | Saved? | Example |
|---|---|---|---|
| Parameter | โ Yes | โ Yes | Weights |
| Buffer | โ No | โ Yes | Running mean |
๐ Module State Management
Your factory needs to remember things! State management is like having a save file for your game.
Saving Your Factory
# Save everything
torch.save(model.state_dict(), 'factory.pth')
Loading Your Factory
# Load it back
model.load_state_dict(
torch.load('factory.pth')
)
Peeking Inside
# See all parameters
for name, param in model.named_parameters():
print(f"{name}: {param.shape}")
# See all modules
for name, module in model.named_modules():
print(f"{name}: {type(module)}")
๐ญ Train and Eval Modes
Your factory has two modesโlike a robot with a โlearningโ switch and a โworkingโ switch.
Training Mode ๐๏ธ
model.train() # Learning mode ON
- Dropout is active (randomly drops neurons)
- BatchNorm uses batch statistics
Evaluation Mode ๐ฏ
model.eval() # Working mode ON
- Dropout is disabled (all neurons work)
- BatchNorm uses saved statistics
# Always do this for testing!
model.eval()
with torch.no_grad():
output = model(test_data)
โ ๏ธ Warning: Forgetting
model.eval()before testing is a common bug that causes weird results!
๐ฆ Sequential and Containers
What if you want to connect many rooms in a line? Use Sequential!
nn.Sequential - The Assembly Line
# Quick way to stack layers
model = nn.Sequential(
nn.Linear(10, 20),
nn.ReLU(),
nn.Linear(20, 5)
)
# Input flows through in order
output = model(input)
graph TD A[Input] --> B[Linear 10โ20] B --> C[ReLU] C --> D[Linear 20โ5] D --> E[Output]
Named Sequential
# Give names to your layers
model = nn.Sequential(
('hidden', nn.Linear(10, 20)),
('activation', nn.ReLU()),
('output', nn.Linear(20, 5))
)
# Access by name
print(model.hidden)
๐ ModuleList - The Flexible Stack
When you need a list of layers but want PyTorch to track them:
class FlexibleFactory(nn.Module):
def __init__(self, num_rooms):
super().__init__()
# ModuleList tracks all layers
self.rooms = nn.ModuleList([
nn.Linear(10, 10)
for _ in range(num_rooms)
])
def forward(self, x):
for room in self.rooms:
x = room(x)
return x
โ Donโt use regular Python lists! PyTorch wonโt find those parameters.
# WRONG - Parameters are invisible!
self.layers = [nn.Linear(10, 10)]
# RIGHT - Parameters are tracked!
self.layers = nn.ModuleList([nn.Linear(10, 10)])
๐ ModuleDict - The Named Collection
When you want to access layers by name instead of position:
class SmartFactory(nn.Module):
def __init__(self):
super().__init__()
self.rooms = nn.ModuleDict({
'paint': nn.Linear(10, 20),
'polish': nn.Linear(20, 20),
'ship': nn.Linear(20, 5)
})
def forward(self, x, room_name):
# Use any room by name!
return self.rooms[room_name](x)
When to use which?
| Container | Use Whenโฆ |
|---|---|
| Sequential | Layers flow in order |
| ModuleList | Need index access |
| ModuleDict | Need name access |
๐ Putting It All Together
Hereโs a complete factory that uses everything we learned:
class UltimateFactory(nn.Module):
def __init__(self):
super().__init__()
# Sequential for main flow
self.main_line = nn.Sequential(
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 128)
)
# ModuleList for repeating blocks
self.extra_rooms = nn.ModuleList([
nn.Linear(128, 128)
for _ in range(3)
])
# ModuleDict for named outputs
self.outputs = nn.ModuleDict({
'classify': nn.Linear(128, 10),
'detect': nn.Linear(128, 4)
})
# Buffer for tracking
self.register_buffer(
'forward_count',
torch.tensor(0)
)
def forward(self, x, task='classify'):
x = self.main_line(x)
for room in self.extra_rooms:
x = torch.relu(room(x))
return self.outputs[task](x)
๐ Key Takeaways
- nn.Module is the foundation of all neural network components
- Always call
super().__init__()in your custom modules - The
forwardmethod defines data flow - Parameters learn, Buffers donโt
- Use
train()andeval()to switch modes - Sequential = ordered layers, ModuleList = indexed layers, ModuleDict = named layers
๐ฏ Remember This!
Every neural network layer is a Module
โโโ Custom modules inherit from nn.Module
โโโ forward() defines what happens to data
โโโ Parameters are learned, Buffers are saved
โโโ train() vs eval() changes behavior
โโโ Containers organize your modules
โโโ Sequential โ In order
โโโ ModuleList โ By index
โโโ ModuleDict โ By name
Youโre now ready to build any neural network architecture! ๐