Statistical Inference: Hypothesis Tests 🔬
The Detective Story of Data
Imagine you’re a detective. Someone tells you, “This cookie jar has exactly 50 red candies and 50 blue candies.” But you can’t count them all—you can only grab a handful. Based on that handful, can you figure out if they’re telling the truth?
That’s exactly what hypothesis testing does! It’s like being a data detective, using clues (samples) to decide if a claim about the whole world (population) is true or false.
🎯 The Big Picture: What Are We Doing?
Every hypothesis test follows the same story:
- Someone makes a claim (the Null Hypothesis - H₀)
- You suspect something different (Alternative Hypothesis - H₁)
- You gather evidence (collect data)
- You decide: Is the evidence strong enough to reject the claim?
Think of it like a courtroom:
- H₀ (Null): “The defendant is innocent” (the default assumption)
- H₁ (Alternative): “The defendant is guilty”
- Evidence: The data you collect
- Verdict: Reject H₀ or fail to reject H₀
🧪 Z-Test for Mean
When to Use It
When you know the population’s standard deviation (σ) and want to test if the average of something matches a claimed value.
The Story
A candy factory claims their chocolate bars weigh exactly 100 grams on average. You’re suspicious! You weigh 36 random bars and find they average 98 grams. Is the factory lying?
The Formula
z = (x̄ - μ₀) / (σ / √n)
Where:
x̄ = sample mean (98g)
μ₀ = claimed mean (100g)
σ = population std dev (say, 6g)
n = sample size (36)
Example Calculation
z = (98 - 100) / (6 / √36)
z = -2 / (6 / 6)
z = -2 / 1
z = -2
A z-score of -2 is pretty far from zero! If our threshold is z = ±1.96 (for 95% confidence), we’d reject the factory’s claim. The bars really do seem lighter!
Simple Rule
- |z| > 1.96 → Reject H₀ (95% confidence)
- |z| > 2.58 → Reject H₀ (99% confidence)
📊 t-Test for Mean
When to Use It
When you don’t know the population’s standard deviation. This is way more common in real life!
The Story
A new energy drink claims to improve reaction time to 200 milliseconds on average. You test 25 gamers and find their average is 215 ms with a standard deviation of 30 ms. Does the drink actually work?
The Formula
t = (x̄ - μ₀) / (s / √n)
Where:
x̄ = sample mean (215 ms)
μ₀ = claimed mean (200 ms)
s = sample std dev (30 ms)
n = sample size (25)
Example Calculation
t = (215 - 200) / (30 / √25)
t = 15 / (30 / 5)
t = 15 / 6
t = 2.5
With 24 degrees of freedom (n-1), a t-value of 2.5 means there’s strong evidence the drink doesn’t hit that 200 ms target!
Z-Test vs t-Test: The Key Difference
| Feature | Z-Test | t-Test |
|---|---|---|
| Population σ | Known | Unknown |
| Sample size | Any (usually large) | Any (especially small) |
| Distribution | Normal | t-distribution |
🎯 Z-Test for Proportion
When to Use It
When you’re testing if a percentage matches a claimed value.
The Story
A company says 70% of customers love their new app. You survey 200 people and only 120 (60%) say they love it. Is the company exaggerating?
The Formula
z = (p̂ - p₀) / √(p₀(1-p₀)/n)
Where:
p̂ = sample proportion (0.60)
p₀ = claimed proportion (0.70)
n = sample size (200)
Example Calculation
z = (0.60 - 0.70) / √(0.70 × 0.30 / 200)
z = -0.10 / √(0.21 / 200)
z = -0.10 / √0.00105
z = -0.10 / 0.0324
z = -3.09
A z-score of -3.09 is way beyond ±1.96! Strong evidence the company is exaggerating. Only about 60% actually love the app.
👫 Paired t-Test
When to Use It
When you measure the same subjects twice (before and after).
The Story
A school wants to know if a new teaching method helps. They test 15 students before and after the new method. Did scores improve?
The Magic
Instead of comparing two separate groups, you look at the difference for each person:
d = After - Before (for each student)
Then you do a regular t-test on those differences!
Example Data
| Student | Before | After | Difference (d) |
|---|---|---|---|
| A | 70 | 78 | +8 |
| B | 65 | 70 | +5 |
| C | 80 | 85 | +5 |
| D | 75 | 80 | +5 |
| E | 60 | 72 | +12 |
The Formula
t = d̄ / (sᵈ / √n)
Where:
d̄ = mean of differences (7)
sᵈ = std dev of differences (2.9)
n = number of pairs (5)
Example Calculation
t = 7 / (2.9 / √5)
t = 7 / (2.9 / 2.24)
t = 7 / 1.29
t = 5.43
A t-value of 5.43 with 4 degrees of freedom? That’s huge! The new teaching method really works!
Why Paired?
Because the same person acts as their own “control group.” This removes person-to-person variation and makes the test more powerful!
⚖️ Two-Sample t-Test
When to Use It
When you compare two different groups of people.
The Story
Do students who drink coffee score differently than students who drink tea? You test 20 coffee drinkers and 18 tea drinkers on a math test.
The Setup
- Group 1 (Coffee): n₁ = 20, x̄₁ = 85, s₁ = 10
- Group 2 (Tea): n₂ = 18, x̄₂ = 80, s₂ = 12
The Formula (Simplified)
t = (x̄₁ - x̄₂) / √(s₁²/n₁ + s₂²/n₂)
Example Calculation
t = (85 - 80) / √(100/20 + 144/18)
t = 5 / √(5 + 8)
t = 5 / √13
t = 5 / 3.61
t = 1.39
A t-value of 1.39 isn’t that extreme. We probably can’t conclude that coffee drinkers do better. The difference might just be random chance!
Paired vs Two-Sample: Quick Guide
graph TD A["Same people measured twice?"] -->|Yes| B["Paired t-Test"] A -->|No| C["Two different groups?"] C -->|Yes| D["Two-Sample t-Test"]
🔄 Two-Sample Z-Test for Proportions
When to Use It
When comparing percentages between two groups.
The Story
A medicine company tests a new drug. Of 200 patients who got the drug, 140 recovered. Of 200 patients who got a placebo, 100 recovered. Is the drug actually better?
The Setup
- Drug group: n₁ = 200, recovered = 140, p̂₁ = 0.70
- Placebo group: n₂ = 200, recovered = 100, p̂₂ = 0.50
The Formula
First, find the pooled proportion:
p̂ = (x₁ + x₂) / (n₁ + n₂)
p̂ = (140 + 100) / (200 + 200)
p̂ = 240 / 400 = 0.60
Then calculate z:
z = (p̂₁ - p̂₂) / √(p̂(1-p̂)(1/n₁ + 1/n₂))
Example Calculation
z = (0.70 - 0.50) / √(0.60 × 0.40 × (1/200 + 1/200))
z = 0.20 / √(0.24 × 0.01)
z = 0.20 / √0.0024
z = 0.20 / 0.049
z = 4.08
A z-score of 4.08! That’s extremely strong evidence. The drug really does help more people recover!
🗺️ The Complete Test Map
graph TD A["What are you testing?"] --> B{Means or Proportions?} B -->|Means| C{Know population σ?} B -->|Proportions| D{One group or two?} C -->|Yes| E["Z-Test for Mean"] C -->|No| F{Same subjects twice?} F -->|Yes| G["Paired t-Test"] F -->|No| H{One group or two?} H -->|One| I["One-Sample t-Test"] H -->|Two| J["Two-Sample t-Test"] D -->|One| K["Z-Test for Proportion"] D -->|Two| L["Two-Sample Z for Proportions"]
🎓 Quick Reference Table
| Test | Use When | You Compare |
|---|---|---|
| Z-Test Mean | σ known | Sample mean vs claimed value |
| t-Test Mean | σ unknown | Sample mean vs claimed value |
| Z-Test Proportion | Testing % | Sample % vs claimed % |
| Paired t-Test | Before/After | Same people, two times |
| Two-Sample t-Test | Two groups | Two group means |
| Two-Sample Z Prop | Two groups | Two group percentages |
💡 The Golden Rules
-
Always state your hypotheses first
- H₀: The boring claim (nothing special happening)
- H₁: The exciting claim (something IS happening)
-
Choose your significance level (α)
- Usually 0.05 (5%) or 0.01 (1%)
- This is your “how sure do I need to be?” setting
-
Calculate your test statistic
- Get a z or t value from your data
-
Compare to critical value
- If |test stat| > critical value → Reject H₀
- If |test stat| < critical value → Can’t reject H₀
-
State your conclusion in plain words
- Don’t just say “reject H₀”
- Say “There is significant evidence that…”
🌟 Remember This!
Hypothesis testing is like a fair trial for data:
- We assume innocence (H₀) until proven guilty
- We need strong evidence to convict (reject H₀)
- “Not guilty” doesn’t mean innocent (failing to reject H₀ doesn’t prove H₀ is true)
- We might make mistakes, but we try to minimize them
You’ve got this! Each test is just asking a slightly different question, but they all follow the same detective story. 🕵️♀️
