📊 Statistical Plots: Understanding Your Data’s Story
The Birthday Party Analogy 🎂
Imagine you’re throwing birthday parties for all the kids in your school. Some kids are 6 years old, some are 8, some are 10. How do you figure out what age group is most common? How spread out are the ages? That’s exactly what statistical distribution plots help us see!
Think of these plots as magic windows that show you the shape of your data—where most numbers live, how spread out they are, and if there are any strange ones hiding in the corners.
1. 📦 Box Plot Basics: The Five-Number Summary
What’s a Box Plot?
A box plot is like a treasure map of your data. It shows you 5 special landmarks:
- Minimum - The smallest kid at the party
- Q1 (First Quartile) - Where 25% of kids are shorter
- Median - The kid right in the middle
- Q3 (Third Quartile) - Where 75% of kids are shorter
- Maximum - The tallest kid at the party
🎯 Simple Example
import matplotlib.pyplot as plt
import numpy as np
# Test scores from your class
scores = [65, 70, 72, 75, 78, 80,
82, 85, 88, 90, 95]
plt.boxplot(scores)
plt.ylabel('Test Scores')
plt.title('Class Test Scores')
plt.show()
What You’ll See
graph TD A["Maximum: 95"] --> B["Q3: 88"] B --> C["Median: 80"] C --> D["Q1: 72"] D --> E["Minimum: 65"]
The Box = Middle 50% of your data (Q1 to Q3) The Line in the Box = Median (middle value) The Whiskers = Lines stretching to min and max The Dots = Outliers (unusual values)
🌟 Real Life Example
Imagine sorting your toy cars by size. The box plot tells you:
- Most cars are medium-sized (the box)
- A few are tiny or huge (the whiskers)
- That monster truck? It’s an outlier!
2. 🎨 Box Plot Customization: Make It Your Own!
Adding Colors and Style
Plain boxes are boring! Let’s make them beautiful.
import matplotlib.pyplot as plt
import numpy as np
# Three classes' test scores
class_a = [65, 70, 75, 80, 85, 90]
class_b = [55, 60, 65, 70, 75, 80]
class_c = [75, 80, 85, 90, 95, 100]
data = [class_a, class_b, class_c]
# Create colorful box plot
bp = plt.boxplot(data, patch_artist=True)
# Add colors to each box
colors = ['lightblue', 'lightgreen', 'pink']
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
plt.xticks([1, 2, 3], ['Class A', 'B', 'C'])
plt.ylabel('Scores')
plt.title('Comparing Classes')
plt.show()
🎯 Key Customization Options
| Parameter | What It Does | Example |
|---|---|---|
patch_artist=True |
Allows filling boxes | Required for colors |
notch=True |
Adds confidence notch | Shows median uncertainty |
vert=False |
Horizontal box plot | Good for long labels |
widths=0.5 |
Box width | 0.5 = medium width |
showfliers=False |
Hide outliers | Clean look |
Making It Horizontal
plt.boxplot(scores, vert=False)
plt.xlabel('Test Scores')
plt.title('Horizontal Box Plot')
plt.show()
Adding Notches (Confidence Intervals)
plt.boxplot(data, notch=True)
plt.title('Notched Box Plots')
plt.show()
Notches show if medians are significantly different. If notches don’t overlap, the medians are probably different!
3. 🎻 Violin Plot: The Shape Teller
What’s a Violin Plot?
A violin plot is like a box plot that ate a mirror! It shows you the same information PLUS the shape of your data.
Think of it like this:
- Box plot = “Here are 5 important numbers”
- Violin plot = “Here’s the WHOLE picture”
🎯 Why It Looks Like a Violin
The wider parts show where more data points live. The skinny parts show where fewer data points are.
import matplotlib.pyplot as plt
import numpy as np
# Heights of kids in two grades
grade_3 = np.random.normal(120, 5, 100)
grade_5 = np.random.normal(140, 8, 100)
data = [grade_3, grade_5]
plt.violinplot(data)
plt.xticks([1, 2], ['Grade 3', 'Grade 5'])
plt.ylabel('Height (cm)')
plt.title('Student Heights by Grade')
plt.show()
The Shape Tells a Story
graph TD A["Wide at Top"] -->|Many tall kids| B["Skinny Middle"] B -->|Fewer medium kids| C["Wide at Bottom"] C -->|Many short kids| D["Bimodal Distribution!"]
Customizing Violin Plots
# Show quartiles inside the violin
vp = plt.violinplot(data,
showmedians=True,
showextrema=True,
showmeans=True)
# Change colors
for body in vp['bodies']:
body.set_facecolor('purple')
body.set_alpha(0.7)
plt.title('Customized Violin Plot')
plt.show()
🎻 vs 📦 When to Use Which?
| Scenario | Best Choice |
|---|---|
| Quick summary | Box Plot |
| See data shape | Violin Plot |
| Many groups | Box Plot |
| Small dataset | Box Plot |
| Large dataset | Violin Plot |
| Bimodal data | Violin Plot |
4. 📏 Error Bars: Showing Uncertainty
What Are Error Bars?
Error bars are like “I’m not 100% sure” markers. They show how confident we are about our measurements.
Imagine measuring your height:
- You measure: 120 cm
- But the ruler might be off by ±2 cm
- Error bars show this: 120 cm ± 2 cm
🎯 Simple Error Bar Example
import matplotlib.pyplot as plt
import numpy as np
# Average test scores with uncertainty
subjects = ['Math', 'Science', 'English']
averages = [75, 82, 78]
errors = [5, 3, 4] # How uncertain we are
plt.bar(subjects, averages, yerr=errors,
capsize=5, color='skyblue')
plt.ylabel('Average Score')
plt.title('Test Scores with Error Bars')
plt.show()
Types of Error Values
graph LR A["Error Types"] --> B["Standard Deviation"] A --> C["Standard Error"] A --> D["Confidence Interval"] B --> E["How spread out data is"] C --> F["Uncertainty in mean"] D --> G[Range we're 95% sure about]
Error Bars on Line Plots
# Temperatures over a week with uncertainty
days = [1, 2, 3, 4, 5, 6, 7]
temps = [20, 22, 19, 24, 26, 23, 21]
temp_errors = [2, 1.5, 2, 3, 2, 1, 2.5]
plt.errorbar(days, temps, yerr=temp_errors,
fmt='o-', capsize=5,
color='red', ecolor='gray')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.title('Weekly Temperatures')
plt.show()
Asymmetric Error Bars
Sometimes errors aren’t the same up and down!
# Different errors above and below
values = [10, 20, 30]
lower_err = [2, 3, 2] # Error below
upper_err = [4, 2, 5] # Error above
plt.bar([1, 2, 3], values,
yerr=[lower_err, upper_err],
capsize=5)
plt.title('Asymmetric Error Bars')
plt.show()
🎨 Error Bar Styling
| Parameter | What It Does |
|---|---|
yerr |
Vertical error size |
xerr |
Horizontal error size |
capsize |
Width of error cap |
ecolor |
Error bar color |
elinewidth |
Error line thickness |
fmt |
Marker + line style |
🎯 Putting It All Together
Compare All Three!
import matplotlib.pyplot as plt
import numpy as np
# Same data, three views
data = np.random.normal(50, 10, 100)
fig, axes = plt.subplots(1, 3,
figsize=(10, 4))
# Box Plot
axes[0].boxplot(data)
axes[0].set_title('Box Plot')
# Violin Plot
axes[1].violinplot(data)
axes[1].set_title('Violin Plot')
# Bar with Error
mean = np.mean(data)
std = np.std(data)
axes[2].bar(['Data'], [mean],
yerr=[std], capsize=10)
axes[2].set_title('Mean + Error')
plt.tight_layout()
plt.show()
🌟 Remember This!
| Plot Type | Shows You | Use When |
|---|---|---|
| Box Plot | 5-number summary | Quick comparison |
| Violin Plot | Data shape + spread | Understanding distribution |
| Error Bars | Uncertainty | Showing measurement accuracy |
The Golden Rule: Pick the plot that tells YOUR story best!
🚀 Quick Code Reference
# Box Plot
plt.boxplot(data, patch_artist=True)
# Violin Plot
plt.violinplot(data, showmedians=True)
# Error Bars (bar chart)
plt.bar(x, y, yerr=errors, capsize=5)
# Error Bars (line plot)
plt.errorbar(x, y, yerr=errors, fmt='o-')
You’ve got this! Now go visualize some data! 📊✨
