What is depthwise convolution?

Depthwise convolution processes each color channel separately instead of mixing them. This makes it 9x faster than regular convolution with similar accuracy.

Why does ResNet use skip connections?

Skip connections solve the degradation problem where deep networks get worse after ~20 layers. They let information bypass layers, enabling 152+ layer networks.

What is the bottleneck architecture?

Bottleneck architecture squeezes channels down, processes them, then expands back. This reduces computations by 8.5x while maintaining accuracy.

CNN Architectures | Deep Learning Guide

Q: What are CNN architectures?

CNN architectures are different recipes for building neural networks that can see and recognize images. Each architecture combines convolution blocks in unique ways.

CNN Architectures: Building Blocks of Vision AI

The LEGO Analogy: Think of CNN architectures like building with LEGO blocks. Each block type (convolution) has a special job. Some blocks are thin and light (depthwise), some reach far (dilated), some grow bigger (transposed). The magic happens when you stack them smartly!

What Are CNN Architectures?

Imagine you’re building a robot that can see and recognize things—cats, cars, faces. CNN architectures are like different recipes for building that robot’s eyes and brain.

Simple Example:

A basic CNN is like a simple camera that just takes pictures
Advanced architectures are like smart cameras that can zoom, focus, and understand what they see!

1. Depthwise Convolution

The Story

Imagine you have a coloring book with three pages (Red, Green, Blue). Instead of one big crayon coloring all pages at once, you use three separate small crayons—one for each page.

What Is It?

Depthwise convolution processes each color channel separately instead of mixing them all together.

Normal Convolution:
[R,G,B] → ONE big filter → Output

Depthwise Convolution:
R → tiny filter → R output
G → tiny filter → G output
B → tiny filter → B output

Why Use It?

Feature	Regular	Depthwise
Speed	Slow	9x Faster!
Memory	Heavy	Light
Accuracy	Good	Good

Real Example

MobileNet uses depthwise convolution. That’s how your phone can identify objects in photos instantly without draining your battery!

graph TD
    A["Input Image"] --> B["Red Channel"]
    A --> C["Green Channel"]
    A --> D["Blue Channel"]
    B --> E["Filter 1"]
    C --> F["Filter 2"]
    D --> G["Filter 3"]
    E --> H["Combine"]
    F --> H
    G --> H
    H --> I["Output"]

2. Dilated Convolution

The Story

Imagine looking through a fence. Normal vision sees only what’s right in front. But what if your eyes could skip gaps in the fence and see farther without moving?

What Is It?

Dilated convolution adds gaps (holes) between filter pixels. This lets the network see a wider area without using more computing power.

Normal 3x3 filter:
[X X X]
[X X X]
[X X X]

Dilated 3x3 (rate=2):
[X . X . X]
[. . . . .]
[X . X . X]
[. . . . .]
[X . X . X]

Why Use It?

See the big picture without losing detail
Great for segmentation (coloring each pixel in an image)
Used in self-driving cars to see roads and objects

Real Example

When your phone blurs the background in portrait mode, dilated convolutions help it understand what’s “far” and what’s “close”!

3. Transposed Convolution

The Story

Regular convolution is like shrinking a big photo to a thumbnail. Transposed convolution does the opposite—it grows a small image into a bigger one!

What Is It?

Also called “deconvolution,” it upsamples (enlarges) feature maps. Think of it as the “zoom in” button.

graph TD
    A["Small 4x4 Image"] --> B["Transposed Conv"]
    B --> C["Bigger 8x8 Image"]
    C --> D["Even Bigger 16x16"]

Why Use It?

Use Case	How It Helps
Image Generation	Creates new images from noise
Segmentation	Restores full-size masks
Super Resolution	Makes blurry images sharp

Real Example

AI art generators use transposed convolution to turn tiny random noise into beautiful 1024x1024 images!

4. CNN Architecture Evolution

The Story

CNN architectures evolved like smartphones—each generation learned from the last and got smarter!

graph TD
    A["1998: LeNet"] --> B["2012: AlexNet"]
    B --> C["2014: VGGNet"]
    C --> D["2014: GoogLeNet"]
    D --> E["2015: ResNet"]
    E --> F["2017: MobileNet"]
    F --> G["2019: EfficientNet"]

The Timeline

Year	Architecture	Big Idea
1998	LeNet	First CNN! Read digits
2012	AlexNet	Deep + GPU = Magic
2014	VGG	Deeper is better
2014	GoogLeNet	Multiple filter sizes
2015	ResNet	Skip connections
2017	MobileNet	Efficient for phones
2019	EfficientNet	Best accuracy/speed

Real Example

AlexNet won ImageNet 2012 by a huge margin. This single event started the deep learning revolution we live in today!

5. ResNet and Residual Blocks

The Story

Imagine climbing a very tall ladder. Each step (layer) makes you tired. What if you could teleport (skip) some steps while still remembering where you came from?

The Problem

Deep networks should learn better, right? Wrong! After ~20 layers, they actually get worse. This is the “degradation problem.”

The Solution: Skip Connections

ResNet adds shortcuts that let information skip layers:

Input ──→ [Conv] ──→ [Conv] ──→ + ──→ Output
   │                            ↑
   └────────────────────────────┘
         (Skip Connection)

Why It’s Magic

Without skip: Learns F(x)
With skip:    Learns F(x) + x

The network only needs to learn
the DIFFERENCE (residual), not
everything from scratch!

Real Example

ResNet-152 has 152 layers and won ImageNet 2015. Without skip connections, training this would be impossible!

6. Bottleneck Architecture

The Story

Imagine a water pipe. If you make it narrow in the middle (like a bottle’s neck), less water flows, but you save material. CNNs do the same with information!

What Is It?

A bottleneck squeezes channels down, processes them, then expands back:

graph TD
    A["256 channels"] --> B["1x1 Conv: Squeeze"]
    B --> C["64 channels"]
    C --> D["3x3 Conv: Process"]
    D --> E["64 channels"]
    E --> F["1x1 Conv: Expand"]
    F --> G["256 channels"]

The Math Savings

Method	Computations
Direct 3x3 on 256	589,824
Bottleneck	69,632
Savings	8.5x faster!

Real Example

ResNet-50 uses bottleneck blocks. This is why your phone can run image recognition in real-time!

7. Squeeze and Excitation (SE)

The Story

Not all TV channels are equally interesting. SE blocks let the network pick favorites—it boosts important channels and mutes boring ones!

How It Works

Squeeze: Summarize each channel into one number
Excite: Learn which channels matter most
Scale: Multiply channels by their importance

graph TD
    A["Feature Map"] --> B["Global Avg Pool"]
    B --> C["Squeeze: 1 number/channel"]
    C --> D["FC Layer: Reduce"]
    D --> E["FC Layer: Expand"]
    E --> F["Sigmoid: 0-1 weights"]
    F --> G["Scale Original Features"]
    A --> G
    G --> H["Output"]

Real Example

SENet won ImageNet 2017! By adding SE blocks to any network, accuracy improves by ~1% with tiny extra cost.

The Analogy

Think of a music equalizer:

Bass channels get boosted for action movies
Treble channels get boosted for dialogue
SE blocks do this automatically for image features!

8. Image Classification

The Story

Image classification is the original superhero power of CNNs. Show it a picture, and it tells you what’s in it!

How It Works

graph TD
    A["Input Image"] --> B["Conv Layers"]
    B --> C["Extract Features"]
    C --> D["Flatten"]
    D --> E["Fully Connected"]
    E --> F["Softmax"]
    F --> G["Cat: 95%"]
    F --> H["Dog: 4%"]
    F --> I["Bird: 1%"]

The Pipeline

Step	What Happens
1. Input	224x224 RGB image
2. Conv Layers	Find edges, textures, shapes
3. Pooling	Shrink & summarize
4. Flatten	Make 1D vector
5. Dense	Make final decision
6. Softmax	Convert to probabilities

Real Example

ImageNet Challenge uses 1000 categories:

Dog breeds (120 types!)
Cars, planes, boats
Foods, plants, animals

Modern CNNs achieve >90% accuracy—better than most humans!

Why Architecture Matters

Architecture	ImageNet Accuracy	Parameters
AlexNet	63%	60M
VGG-16	74%	138M
ResNet-50	79%	25M
EfficientNet-B7	84%	66M

Notice: ResNet-50 beats VGG with 5x fewer parameters! That’s the power of smart architecture.

Quick Summary

Architecture	Key Idea	Best For
Depthwise Conv	Separate channels	Mobile apps
Dilated Conv	Gaps in filter	Segmentation
Transposed Conv	Upsample images	Generation
ResNet	Skip connections	Very deep nets
Bottleneck	Squeeze-expand	Efficiency
SE Blocks	Channel attention	Accuracy boost

The Big Picture

graph TD
    A["Simple CNN"] --> B["Go Deeper?"]
    B --> C{Problem: Degradation}
    C --> D["ResNet: Skip Connections"]
    D --> E{Problem: Too Slow}
    E --> F["Bottleneck + Depthwise"]
    F --> G{Problem: What Matters?}
    G --> H["SE: Channel Attention"]
    H --> I["Modern Efficient CNNs"]

You now understand how CNNs evolved from simple filters to smart, efficient architectures that power everything from your phone’s camera to self-driving cars!

Remember: Each architecture piece solves a specific problem. Like LEGO, the magic is in how you combine them!

CNN Architectures

Unable to load concept

Coming Soon...

CNN Architectures: Building Blocks of Vision AI

What Are CNN Architectures?

1. Depthwise Convolution

The Story

What Is It?

Why Use It?

Real Example

2. Dilated Convolution

The Story

What Is It?

Why Use It?

Real Example

3. Transposed Convolution

The Story

What Is It?

Why Use It?

Real Example

4. CNN Architecture Evolution

The Story

The Timeline

Real Example

5. ResNet and Residual Blocks

The Story

The Problem

The Solution: Skip Connections

Why It’s Magic

Real Example

6. Bottleneck Architecture

The Story

What Is It?

The Math Savings

Real Example

7. Squeeze and Excitation (SE)

The Story

How It Works

Real Example

The Analogy

8. Image Classification

The Story

How It Works

The Pipeline

Real Example

Why Architecture Matters

Quick Summary

The Big Picture

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue