Statistical Distributions

Back

Loading concept...

📊 Statistical Plots: Understanding Your Data’s Story

The Birthday Party Analogy 🎂

Imagine you’re throwing birthday parties for all the kids in your school. Some kids are 6 years old, some are 8, some are 10. How do you figure out what age group is most common? How spread out are the ages? That’s exactly what statistical distribution plots help us see!

Think of these plots as magic windows that show you the shape of your data—where most numbers live, how spread out they are, and if there are any strange ones hiding in the corners.


1. 📦 Box Plot Basics: The Five-Number Summary

What’s a Box Plot?

A box plot is like a treasure map of your data. It shows you 5 special landmarks:

  1. Minimum - The smallest kid at the party
  2. Q1 (First Quartile) - Where 25% of kids are shorter
  3. Median - The kid right in the middle
  4. Q3 (Third Quartile) - Where 75% of kids are shorter
  5. Maximum - The tallest kid at the party

🎯 Simple Example

import matplotlib.pyplot as plt
import numpy as np

# Test scores from your class
scores = [65, 70, 72, 75, 78, 80,
          82, 85, 88, 90, 95]

plt.boxplot(scores)
plt.ylabel('Test Scores')
plt.title('Class Test Scores')
plt.show()

What You’ll See

graph TD A["Maximum: 95"] --> B["Q3: 88"] B --> C["Median: 80"] C --> D["Q1: 72"] D --> E["Minimum: 65"]

The Box = Middle 50% of your data (Q1 to Q3) The Line in the Box = Median (middle value) The Whiskers = Lines stretching to min and max The Dots = Outliers (unusual values)

🌟 Real Life Example

Imagine sorting your toy cars by size. The box plot tells you:

  • Most cars are medium-sized (the box)
  • A few are tiny or huge (the whiskers)
  • That monster truck? It’s an outlier!

2. 🎨 Box Plot Customization: Make It Your Own!

Adding Colors and Style

Plain boxes are boring! Let’s make them beautiful.

import matplotlib.pyplot as plt
import numpy as np

# Three classes' test scores
class_a = [65, 70, 75, 80, 85, 90]
class_b = [55, 60, 65, 70, 75, 80]
class_c = [75, 80, 85, 90, 95, 100]

data = [class_a, class_b, class_c]

# Create colorful box plot
bp = plt.boxplot(data, patch_artist=True)

# Add colors to each box
colors = ['lightblue', 'lightgreen', 'pink']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)

plt.xticks([1, 2, 3], ['Class A', 'B', 'C'])
plt.ylabel('Scores')
plt.title('Comparing Classes')
plt.show()

🎯 Key Customization Options

Parameter What It Does Example
patch_artist=True Allows filling boxes Required for colors
notch=True Adds confidence notch Shows median uncertainty
vert=False Horizontal box plot Good for long labels
widths=0.5 Box width 0.5 = medium width
showfliers=False Hide outliers Clean look

Making It Horizontal

plt.boxplot(scores, vert=False)
plt.xlabel('Test Scores')
plt.title('Horizontal Box Plot')
plt.show()

Adding Notches (Confidence Intervals)

plt.boxplot(data, notch=True)
plt.title('Notched Box Plots')
plt.show()

Notches show if medians are significantly different. If notches don’t overlap, the medians are probably different!


3. 🎻 Violin Plot: The Shape Teller

What’s a Violin Plot?

A violin plot is like a box plot that ate a mirror! It shows you the same information PLUS the shape of your data.

Think of it like this:

  • Box plot = “Here are 5 important numbers”
  • Violin plot = “Here’s the WHOLE picture”

🎯 Why It Looks Like a Violin

The wider parts show where more data points live. The skinny parts show where fewer data points are.

import matplotlib.pyplot as plt
import numpy as np

# Heights of kids in two grades
grade_3 = np.random.normal(120, 5, 100)
grade_5 = np.random.normal(140, 8, 100)

data = [grade_3, grade_5]

plt.violinplot(data)
plt.xticks([1, 2], ['Grade 3', 'Grade 5'])
plt.ylabel('Height (cm)')
plt.title('Student Heights by Grade')
plt.show()

The Shape Tells a Story

graph TD A["Wide at Top"] -->|Many tall kids| B["Skinny Middle"] B -->|Fewer medium kids| C["Wide at Bottom"] C -->|Many short kids| D["Bimodal Distribution!"]

Customizing Violin Plots

# Show quartiles inside the violin
vp = plt.violinplot(data,
                     showmedians=True,
                     showextrema=True,
                     showmeans=True)

# Change colors
for body in vp['bodies']:
    body.set_facecolor('purple')
    body.set_alpha(0.7)

plt.title('Customized Violin Plot')
plt.show()

🎻 vs 📦 When to Use Which?

Scenario Best Choice
Quick summary Box Plot
See data shape Violin Plot
Many groups Box Plot
Small dataset Box Plot
Large dataset Violin Plot
Bimodal data Violin Plot

4. 📏 Error Bars: Showing Uncertainty

What Are Error Bars?

Error bars are like “I’m not 100% sure” markers. They show how confident we are about our measurements.

Imagine measuring your height:

  • You measure: 120 cm
  • But the ruler might be off by ±2 cm
  • Error bars show this: 120 cm ± 2 cm

🎯 Simple Error Bar Example

import matplotlib.pyplot as plt
import numpy as np

# Average test scores with uncertainty
subjects = ['Math', 'Science', 'English']
averages = [75, 82, 78]
errors = [5, 3, 4]  # How uncertain we are

plt.bar(subjects, averages, yerr=errors,
        capsize=5, color='skyblue')
plt.ylabel('Average Score')
plt.title('Test Scores with Error Bars')
plt.show()

Types of Error Values

graph LR A["Error Types"] --> B["Standard Deviation"] A --> C["Standard Error"] A --> D["Confidence Interval"] B --> E["How spread out data is"] C --> F["Uncertainty in mean"] D --> G[Range we're 95% sure about]

Error Bars on Line Plots

# Temperatures over a week with uncertainty
days = [1, 2, 3, 4, 5, 6, 7]
temps = [20, 22, 19, 24, 26, 23, 21]
temp_errors = [2, 1.5, 2, 3, 2, 1, 2.5]

plt.errorbar(days, temps, yerr=temp_errors,
             fmt='o-', capsize=5,
             color='red', ecolor='gray')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.title('Weekly Temperatures')
plt.show()

Asymmetric Error Bars

Sometimes errors aren’t the same up and down!

# Different errors above and below
values = [10, 20, 30]
lower_err = [2, 3, 2]  # Error below
upper_err = [4, 2, 5]  # Error above

plt.bar([1, 2, 3], values,
        yerr=[lower_err, upper_err],
        capsize=5)
plt.title('Asymmetric Error Bars')
plt.show()

🎨 Error Bar Styling

Parameter What It Does
yerr Vertical error size
xerr Horizontal error size
capsize Width of error cap
ecolor Error bar color
elinewidth Error line thickness
fmt Marker + line style

🎯 Putting It All Together

Compare All Three!

import matplotlib.pyplot as plt
import numpy as np

# Same data, three views
data = np.random.normal(50, 10, 100)

fig, axes = plt.subplots(1, 3,
                          figsize=(10, 4))

# Box Plot
axes[0].boxplot(data)
axes[0].set_title('Box Plot')

# Violin Plot
axes[1].violinplot(data)
axes[1].set_title('Violin Plot')

# Bar with Error
mean = np.mean(data)
std = np.std(data)
axes[2].bar(['Data'], [mean],
            yerr=[std], capsize=10)
axes[2].set_title('Mean + Error')

plt.tight_layout()
plt.show()

🌟 Remember This!

Plot Type Shows You Use When
Box Plot 5-number summary Quick comparison
Violin Plot Data shape + spread Understanding distribution
Error Bars Uncertainty Showing measurement accuracy

The Golden Rule: Pick the plot that tells YOUR story best!


🚀 Quick Code Reference

# Box Plot
plt.boxplot(data, patch_artist=True)

# Violin Plot
plt.violinplot(data, showmedians=True)

# Error Bars (bar chart)
plt.bar(x, y, yerr=errors, capsize=5)

# Error Bars (line plot)
plt.errorbar(x, y, yerr=errors, fmt='o-')

You’ve got this! Now go visualize some data! 📊✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.