Hypothesis Testing

Back

Loading concept...

🎯 Hypothesis Testing in R: The Detective’s Toolkit

The Big Picture: Becoming a Data Detective 🔍

Imagine you’re a detective. You have a hunch about something, but you need proof before you can say it’s true. That’s exactly what hypothesis testing is!

Think of it like this: Your friend says their new cookie recipe is better than the old one. How do you prove it? You need to test it fairly!

In statistics, we:

  1. Start with a guess (called a hypothesis)
  2. Collect evidence (data)
  3. Use math to decide if our guess is probably true or not

🎭 The Two Players: Null vs Alternative

Every hypothesis test has two characters:

Character Role Example
Null (H₀) “Nothing special is happening” “Both cookie recipes taste the same”
Alternative (H₁) “Something IS different!” “The new recipe tastes different”

The p-value is your “surprise meter”:

  • p < 0.05 → “Wow, this is surprising! Probably not a coincidence!” ✅
  • p ≥ 0.05 → “Meh, could just be luck” ❌

📊 1. The t-Test: Comparing Averages

What is it?

The t-test answers: “Are these two groups really different, or is it just random chance?”

🍦 Ice Cream Analogy

Two ice cream shops claim their scoops are bigger. You weigh 10 scoops from each shop. The t-test tells you if the difference is real or just luck!

Types of t-Tests

Type When to Use R Function
One-sample Compare group to a known value t.test(x, mu = value)
Two-sample Compare two independent groups t.test(x, y)
Paired Same people, two conditions t.test(x, y, paired = TRUE)

R Example

# Shop A scoops (grams)
shop_a <- c(85, 90, 88, 92, 87)

# Shop B scoops (grams)
shop_b <- c(78, 82, 80, 79, 81)

# Are they different?
t.test(shop_a, shop_b)

Output says p-value = 0.002 → Yes! Shop A really does give bigger scoops! 🎉


📋 2. Chi-Square Test: Counting Categories

What is it?

When you’re counting things in categories (not measuring numbers), Chi-Square asks: “Is this pattern what we expected, or is something fishy?”

🎲 Dice Analogy

You roll a die 60 times. You expect each number to appear about 10 times. But 6 shows up 20 times! Is the die rigged, or just luck?

R Example

# What we observed
observed <- c(8, 9, 12, 10, 11, 20)

# What we expected (fair die)
expected <- c(10, 10, 10, 10, 10, 10)

# Is the die fair?
chisq.test(observed, p = expected/60)

If p < 0.05 → That die is probably loaded! 🎲


🔗 3. Correlation Test: Finding Connections

What is it?

Correlation measures how two things move together. Do they go up together? One up, one down? Or no pattern at all?

🌡️ Weather Analogy

When it’s hot outside, ice cream sales go up. That’s a positive correlation!

Correlation Values

Value Meaning
+1 Perfect together (both rise)
0 No relationship
-1 Perfect opposites (one rises, other falls)

R Example

# Temperature (°F)
temp <- c(70, 75, 80, 85, 90)

# Ice cream sales
sales <- c(100, 120, 150, 180, 200)

# Are they connected?
cor.test(temp, sales)

If p < 0.05 and r ≈ 0.98 → Yes! Hot weather really does boost sales! ☀️🍦


🏆 4. ANOVA: Comparing Many Groups

What is it?

ANOVA is like a t-test, but for 3 or more groups. It asks: “Is at least one group different from the others?”

🏅 Sports Analogy

Three coaches claim their training methods are best. You test athletes from all three programs. ANOVA tells you if there’s any real difference!

R Example

# Scores from three coaches
coach_a <- c(85, 88, 90, 87)
coach_b <- c(78, 80, 82, 79)
coach_c <- c(92, 95, 91, 93)

# Combine data
scores <- c(coach_a, coach_b, coach_c)
coach <- factor(rep(c("A","B","C"), each=4))

# Run ANOVA
result <- aov(scores ~ coach)
summary(result)

If p < 0.05 → At least one coach’s method is different! (Coach C looks best!)


📏 5. Variance Test (F-Test): Comparing Spread

What is it?

While t-tests compare averages, variance tests compare how spread out the data is. Are scores more scattered in one group?

🎯 Archery Analogy

Two archers both hit near the bullseye on average. But one archer’s arrows are tightly clustered, while the other’s are all over the target. The variance test measures this!

R Example

# Archer 1: consistent
archer1 <- c(49, 50, 51, 50, 49)

# Archer 2: all over the place
archer2 <- c(45, 55, 40, 60, 50)

# Compare their consistency
var.test(archer1, archer2)

If p < 0.05 → Yes! Their consistency levels are different!


📊 6. Wilcoxon Test: The Non-Parametric Hero

What is it?

When your data is weird (not bell-shaped) or you’re working with rankings instead of exact numbers, Wilcoxon comes to the rescue!

🥇 Race Analogy

Instead of exact race times, you only know who came 1st, 2nd, 3rd, etc. Wilcoxon works with these rankings!

Two Flavors

Test When to Use
Wilcoxon Signed-Rank Paired data (like paired t-test)
Wilcoxon Rank-Sum (Mann-Whitney) Two independent groups

R Example

# Pain scores before treatment
before <- c(8, 9, 7, 8, 9, 8)

# Pain scores after treatment
after <- c(5, 6, 4, 5, 6, 5)

# Did treatment help?
wilcox.test(before, after, paired = TRUE)

If p < 0.05 → The treatment really works! 💊


📈 7. Proportion Test: Comparing Percentages

What is it?

When you’re comparing percentages or ratios, not averages, use the proportion test!

🗳️ Voting Analogy

60% of Town A voted yes, but only 45% of Town B voted yes. Is this a real difference, or just random variation?

R Example

# Town A: 60 yes out of 100
# Town B: 45 yes out of 100

prop.test(
  x = c(60, 45),  # successes
  n = c(100, 100) # totals
)

If p < 0.05 → Towns really do vote differently! 🗳️


🔔 8. Shapiro-Wilk Test: Is Your Data Normal?

What is it?

Many tests assume your data follows a “bell curve” (normal distribution). Shapiro-Wilk checks if that’s true!

📊 Why It Matters

Before using a t-test, you should check if your data is bell-shaped. If not, use Wilcoxon instead!

R Example

# Your data
my_data <- c(12, 15, 14, 13, 16,
             14, 15, 13, 14, 15)

# Is it normally distributed?
shapiro.test(my_data)

If p > 0.05 → Data is normal! Use t-test! ✅ If p < 0.05 → Data is NOT normal! Use Wilcoxon! ⚠️


🗺️ Quick Decision Flowchart

graph TD A["What do you want to test?"] --> B{Comparing averages?} B -->|Yes, 2 groups| C["t-test"] B -->|Yes, 3+ groups| D["ANOVA"] B -->|No| E{Counting categories?} E -->|Yes| F["Chi-Square"] E -->|No| G{Measuring correlation?} G -->|Yes| H["Correlation Test"] G -->|No| I{Comparing spread?} I -->|Yes| J["Variance Test"] I -->|No| K{Comparing percentages?} K -->|Yes| L["Proportion Test"] K -->|No| M{Non-normal data?} M -->|Yes| N["Wilcoxon Test"] M -->|Check first| O["Shapiro-Wilk"]

🎯 Summary: Your Testing Toolkit

Test Use When… R Function
t-Test Comparing 2 group averages t.test()
Chi-Square Counting categorical data chisq.test()
Correlation Finding relationships cor.test()
ANOVA Comparing 3+ group averages aov()
Variance (F-Test) Comparing data spread var.test()
Wilcoxon Non-normal data, rankings wilcox.test()
Proportion Comparing percentages prop.test()
Shapiro-Wilk Checking if data is normal shapiro.test()

🚀 You’ve Got This!

Remember: Every test is just asking a simple question with data. Start with what you want to know, pick the right tool, and let R do the math!

The golden rule:

  • p < 0.05 → “Something real is happening!” ✅
  • p ≥ 0.05 → “Probably just random chance” ❌

Now go forth and test your hypotheses! 🔬✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.