Aggregations and Statistics

Loading concept...

🎯 NumPy: The Number Detective’s Toolkit

The Story of the Number Detective

Imagine you’re a detective with a giant box of colorful marbles. Each marble has a number written on it. Your job? Find secrets hidden in those numbers!

NumPy is your magnifying glass—it helps you discover things like:

  • Which marble has the biggest number?
  • What’s the average of all marbles?
  • How do numbers add up as you go?

Let’s become number detectives together! 🔍


🧮 Aggregation Functions: Squishing Many Into One

What’s aggregation? It’s like squishing a whole bag of gummy bears into one super gummy bear that tells you something about ALL of them.

The Big Three: Sum, Mean, Product

import numpy as np

marbles = np.array([2, 4, 6, 8])

# Sum: Add all marbles together
total = np.sum(marbles)  # 20

# Mean: The "fair share" number
average = np.mean(marbles)  # 5.0

# Product: Multiply all together
multiplied = np.prod(marbles)  # 384

Think of it this way:

  • Sum = How many candies if you dump all bags together?
  • Mean = If everyone gets equal candies, how many each?
  • Product = What if each number was a multiplier in a game?

🏆 Min, Max & Arg Functions: Finding Champions

Finding the Biggest and Smallest

scores = np.array([85, 92, 78, 96, 88])

# Who got the highest score?
best = np.max(scores)  # 96

# Who got the lowest?
lowest = np.min(scores)  # 78

But WHERE Are They? (Arg Functions!)

Sometimes you don’t just want the value—you want the position!

# WHERE is the highest score?
best_position = np.argmax(scores)  # 3

# WHERE is the lowest score?
worst_position = np.argmin(scores)  # 2

Real-life example: In a race, max() tells you the winning time. argmax() tells you which runner won!


📈 Cumulative Operations: Running Totals

Imagine you’re counting coins as you find them, one by one. That’s cumulative!

Cumulative Sum

coins = np.array([5, 3, 7, 2])

running_total = np.cumsum(coins)
# [5, 8, 15, 17]
# 5 → 5+3=8 → 8+7=15 → 15+2=17

Cumulative Product

multipliers = np.array([2, 3, 2])

running_product = np.cumprod(multipliers)
# [2, 6, 12]
# 2 → 2×3=6 → 6×2=12

Like a snowball rolling downhill—it keeps getting bigger!


🎚️ The Axis Parameter: Rows vs Columns

When your numbers live in a grid (2D array), you can squish them in different directions!

grid = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
graph TD A[axis=0<br>Squish DOWN<br>columns] --> B["Sum: [5, 7, 9]"] C[axis=1<br>Squish RIGHT<br>rows] --> D["Sum: [6, 15]"] E[No axis<br>Squish ALL] --> F["Sum: 21"]
# Sum down columns (axis=0)
np.sum(grid, axis=0)  # [5, 7, 9]

# Sum across rows (axis=1)
np.sum(grid, axis=1)  # [6, 15]

# Sum everything (no axis)
np.sum(grid)  # 21

Memory trick:

  • axis=0 → Vertical crush (columns survive)
  • axis=1 → Horizontal crush (rows survive)

📐 Keepdims Parameter: Keeping Shape

When you squish numbers, your array gets smaller. But sometimes you need it to stay the same shape!

grid = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

# Without keepdims: shape becomes (2,)
row_sums = np.sum(grid, axis=1)
# [6, 15] - just a flat list

# With keepdims: shape stays (2, 1)
row_sums_kept = np.sum(grid, axis=1,
                        keepdims=True)
# [[6],
#  [15]] - still looks like a column!

Why does this matter? When you want to do math between the result and the original grid, shapes need to match!


📊 Statistical Functions: Understanding Your Data

Standard Deviation & Variance

These tell you how spread out your numbers are.

test_scores = np.array([70, 75, 80, 85, 90])

# Variance: Average of squared differences
variance = np.var(test_scores)  # 50.0

# Std Dev: Square root of variance
std_dev = np.std(test_scores)  # 7.07

Analogy: Imagine kids standing in a line.

  • Low std dev = Everyone’s about the same height
  • High std dev = Some are very short, some very tall

Other Helpful Stats

data = np.array([3, 1, 4, 1, 5, 9, 2, 6])

np.median(data)  # 3.5 (middle value)
np.ptp(data)     # 8 (peak-to-peak range)

📏 Percentiles & Quantiles: Dividing the Data

Percentiles

“What value is bigger than X% of the data?”

scores = np.array([10, 20, 30, 40, 50,
                   60, 70, 80, 90, 100])

# 50th percentile = median
np.percentile(scores, 50)  # 55.0

# 25th percentile = first quarter
np.percentile(scores, 25)  # 32.5

# 90th percentile = top 10%
np.percentile(scores, 90)  # 91.0

Quantiles

Same idea, but using 0-1 instead of 0-100:

# 0.5 quantile = 50th percentile
np.quantile(scores, 0.5)  # 55.0

# 0.25 quantile = 25th percentile
np.quantile(scores, 0.25)  # 32.5

Real-world use: “You scored in the 95th percentile!” means you did better than 95% of people.


⚖️ Weighted Average: Not All Numbers Are Equal

Sometimes certain values matter more than others!

Regular Average vs Weighted Average

grades = np.array([90, 80, 70])

# Regular average: all equal
np.mean(grades)  # 80.0

# Weighted: some count more!
weights = np.array([3, 2, 1])
# 90 counts 3x, 80 counts 2x, 70 counts 1x

np.average(grades, weights=weights)  # 83.33

Real-life example:

  • Homework = 20% of grade
  • Tests = 50% of grade
  • Final = 30% of grade
scores = np.array([85, 78, 92])
weights = np.array([0.2, 0.5, 0.3])

final_grade = np.average(scores,
                          weights=weights)
# 82.6

🎮 Quick Reference Card

Function What It Does Example
np.sum() Add all [1,2,3]6
np.mean() Average [2,4,6]4
np.max() Biggest [5,9,3]9
np.argmax() Position of biggest [5,9,3]1
np.cumsum() Running total [1,2,3][1,3,6]
np.std() Spread Measures variation
np.percentile() Ranking position Top X%
np.average() Weighted mean With importance

🚀 You’re Now a Number Detective!

You’ve learned to:

  • Squish numbers with aggregations
  • Find champions with min/max
  • Track running totals with cumsum
  • Control direction with axis
  • Keep shapes with keepdims
  • Measure spread with statistics
  • Rank with percentiles
  • Weigh with averages

Every data scientist uses these tools every single day. Now you know them too!

Next step: Practice! Try these on your own arrays and see what secrets you can discover. 🔍✨

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.