Histograms

Back

Loading concept...

📊 Matplotlib Histograms: The Story of Data Buckets

🎬 The Big Picture

Imagine you’re sorting a giant pile of colorful LEGO bricks by color. You grab one bucket for red, one for blue, one for yellow… and you toss each brick into the right bucket. When you’re done, you can SEE which colors you have the most of just by looking at how FULL each bucket is!

That’s exactly what a histogram does with numbers!

A histogram takes your data, sorts it into “buckets” (called bins), and shows you how many items landed in each bucket. It’s like magic glasses that let you SEE your data’s story!


🪣 Histogram Basics

What Is a Histogram?

Think of a histogram like a game where you’re:

  1. Drawing lines on the ground (these are your bins)
  2. Tossing balls (your data) into the spaces between lines
  3. Stacking them up to see which space got the most balls
import matplotlib.pyplot as plt
import numpy as np

# Let's say these are test scores
scores = [65, 72, 78, 82, 85, 88, 90, 92, 95, 98]

# Create the histogram!
plt.hist(scores)
plt.xlabel('Scores')
plt.ylabel('How Many Students')
plt.title('Test Scores Distribution')
plt.show()

The Magic Function: plt.hist()

The hist() function needs just ONE thing: your data!

plt.hist(data)

That’s it! Matplotlib does the rest:

  • Figures out the range of your numbers
  • Creates 10 buckets (bins) by default
  • Counts how many values go in each bucket
  • Draws the bars for you!

Understanding Bins

Bins = Buckets for your numbers

Imagine you have ages: 5, 7, 12, 15, 22, 25, 28, 35, 42, 55

With 3 bins, you might get:

  • Bucket 1 (0-20): 4 people 📦📦📦📦
  • Bucket 2 (20-40): 4 people 📦📦📦📦
  • Bucket 3 (40-60): 2 people 📦📦

With 6 bins, you see MORE detail!

ages = [5, 7, 12, 15, 22, 25, 28, 35, 42, 55]

# Try different bin counts
plt.hist(ages, bins=3)
plt.title('Ages with 3 Bins')
plt.show()

🎨 Histogram Customization

Now let’s make our histograms BEAUTIFUL! Like decorating your room 🎪

Changing Colors

data = np.random.randn(1000)

# Pick your favorite color!
plt.hist(data, color='coral')
plt.show()

Popular colors: 'coral', 'skyblue', 'lightgreen', 'gold', 'violet'

Adding Edges (Outlines)

Without edges, bars can blur together. Add outlines!

plt.hist(data,
         color='lightblue',
         edgecolor='navy')
plt.show()

Controlling Bins

You can tell matplotlib EXACTLY how many bins you want:

# I want exactly 20 bins!
plt.hist(data, bins=20)

# Or specify the exact edges
plt.hist(data, bins=[-3, -2, -1, 0, 1, 2, 3])

Transparency (Alpha)

Make bars see-through! This helps when comparing multiple datasets.

plt.hist(data, alpha=0.7)  # 0 = invisible, 1 = solid

All Together Now!

data = np.random.randn(1000)

plt.hist(data,
         bins=25,
         color='mediumseagreen',
         edgecolor='darkgreen',
         alpha=0.8,
         linewidth=1.2)

plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('My Beautiful Histogram!')
plt.show()

📊 Multiple Histograms

What if you want to compare TWO groups? Like comparing test scores from Class A vs Class B?

Side by Side

class_a = np.random.normal(75, 10, 100)
class_b = np.random.normal(80, 8, 100)

plt.hist(class_a, label='Class A', alpha=0.5)
plt.hist(class_b, label='Class B', alpha=0.5)
plt.legend()
plt.show()

🎯 Pro Tip: Use alpha (transparency) so you can see where they overlap!

Stacked Histograms

Stack one on top of the other:

plt.hist([class_a, class_b],
         stacked=True,
         label=['Class A', 'Class B'],
         color=['skyblue', 'salmon'])
plt.legend()
plt.show()

Same Bins for Fair Comparison

When comparing, use the SAME bins for both:

# Define bins once
my_bins = np.linspace(50, 100, 20)

plt.hist(class_a, bins=my_bins, alpha=0.6)
plt.hist(class_b, bins=my_bins, alpha=0.6)
plt.show()

This makes the comparison FAIR! 🎯

Multiple Histograms in Subplots

Sometimes overlapping is messy. Put them side by side!

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

ax1.hist(class_a, color='coral')
ax1.set_title('Class A Scores')

ax2.hist(class_b, color='teal')
ax2.set_title('Class B Scores')

plt.tight_layout()
plt.show()

🗺️ 2D Histogram (Heatmap Style!)

Now here’s where it gets COOL! 🌟

Regular histograms work with ONE list of numbers. But what if you have TWO related measurements?

Example: Height AND Weight of people

A 2D Histogram creates a GRID and colors each cell based on how many data points fall there. It’s like a treasure map showing where your data clusters!

Creating a 2D Histogram

# Height and weight data
height = np.random.normal(170, 10, 1000)
weight = np.random.normal(70, 15, 1000)

plt.hist2d(height, weight, bins=20)
plt.colorbar(label='Count')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.title('Height vs Weight Distribution')
plt.show()

Understanding the Colors

  • Bright/Hot colors = LOTS of data points there! 🔥
  • Dark/Cool colors = Few data points 🧊

The colorbar on the side tells you what each color means!

Customizing 2D Histograms

plt.hist2d(height, weight,
           bins=30,
           cmap='plasma')  # Try: viridis, hot, cool
plt.colorbar()
plt.show()

Fun colormaps to try:

  • 'viridis' - Purple to yellow (default, beautiful!)
  • 'hot' - Black to red to yellow to white
  • 'cool' - Cyan to magenta
  • 'plasma' - Purple to orange to yellow

Different Bin Counts for Each Axis

You can have different detail levels for X and Y:

plt.hist2d(height, weight, bins=[20, 30])
# 20 bins for height, 30 bins for weight

🧭 Quick Mental Map

graph TD A["Your Data"] --> B{How many variables?} B -->|One variable| C["Regular Histogram<br>plt.hist"] B -->|Two variables| D["2D Histogram<br>plt.hist2d"] C --> E["Customize!<br>bins, color, alpha"] C --> F["Compare Multiple<br>overlay or subplots"] D --> G["Customize!<br>bins, cmap, colorbar"]

🎯 The Golden Rules

  1. Start Simple: plt.hist(data) - see what happens first!
  2. Adjust Bins: Too few = lose detail. Too many = too noisy.
  3. Use Alpha: When comparing multiple histograms
  4. Same Bins: For fair comparisons between groups
  5. 2D for Pairs: When you have two related measurements

🚀 You Did It!

You now know how to:

  • ✅ Create basic histograms with plt.hist()
  • ✅ Customize colors, edges, bins, and transparency
  • ✅ Compare multiple datasets with overlapping or stacked histograms
  • ✅ Create 2D histograms to see patterns in paired data

Remember: Histograms are your data’s autobiography. They show the story of where your numbers like to hang out! 📖✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.