Chi-Square Tests

Back

Loading concept...

🎲 Chi-Square Tests: The Detective’s Toolkit for Data

Imagine you’re a detective. Your job? Finding out if what you SEE in the real world matches what you EXPECT to see. Chi-Square tests are your magnifying glass!


🎯 The Big Picture

Think of Chi-Square (pronounced “kai-square”) like this:

You have a bag of candies. The label says there should be equal amounts of red, blue, green, and yellow candies. But when you open it… is that really true? 🍬

Chi-Square tests help us answer: “Is what I observe different from what I expected, or is it just random chance?”


📊 The Chi-Square Distribution

What Is It?

The Chi-Square distribution is like a special ruler we use to measure “surprise” in our data.

Simple Analogy:

  • Imagine rolling dice 60 times
  • You EXPECT each number (1-6) to appear about 10 times
  • But you GET: 1 appears 15 times, 6 appears only 5 times
  • The Chi-Square distribution tells you: “Is this surprising enough to matter?”

The Magic Formula

χ² = Σ (Observed - Expected)² / Expected

Breaking it down for a 5-year-old:

  1. Observed = What you actually counted
  2. Expected = What you thought you’d count
  3. Subtract them (find the difference)
  4. Square it (make negatives positive)
  5. Divide by expected (make big numbers fair)
  6. Add all pieces together

Key Features

graph TD A["Chi-Square Distribution"] --> B["Always Positive"] A --> C["Skewed Right"] A --> D["Shape depends on 'degrees of freedom'"] B --> E[Can't have negative χ² values] C --> F["Tail stretches to the right"] D --> G["More categories = different shape"]

Example:

  • You flip a coin 100 times
  • Expected: 50 heads, 50 tails
  • Observed: 60 heads, 40 tails
  • χ² = (60-50)²/50 + (40-50)²/50 = 100/50 + 100/50 = 4

✅ Chi-Square Goodness of Fit

The Big Question

“Does my data FIT what I expected?”

The Candy Store Story 🍭

A candy company says their bags contain:

  • 30% red
  • 30% blue
  • 20% green
  • 20% yellow

You buy a bag with 100 candies and count:

  • Red: 35
  • Blue: 25
  • Green: 22
  • Yellow: 18

Does this match the company’s claim?

Step-by-Step

Color Observed (O) Expected (E) (O-E)² / E
Red 35 30 0.83
Blue 25 30 0.83
Green 22 20 0.20
Yellow 18 20 0.20
Total 100 100 χ² = 2.06

Degrees of Freedom = Categories - 1 = 4 - 1 = 3

Compare χ² = 2.06 to the critical value. If it’s smaller, the candy company’s claim is probably true!

When to Use It

  • Testing if a die is fair
  • Checking if survey responses match expected proportions
  • Verifying genetic ratios in biology

🔗 Chi-Square Test of Independence

The Big Question

“Are two things CONNECTED or just happening by coincidence?”

The Ice Cream Detective Story 🍦

You notice something: Kids who eat breakfast seem to do better on tests. But is eating breakfast ACTUALLY connected to test scores, or is it just coincidence?

Real Example

Survey of 200 students:

Good Grades Average Grades Total
Eats Breakfast 60 40 100
Skips Breakfast 30 70 100
Total 90 110 200

Null Hypothesis: Breakfast and grades are NOT connected (independent)

Calculating Expected Values

Formula: Expected = (Row Total × Column Total) / Grand Total

Good Grades Average Grades
Eats Breakfast (100×90)/200 = 45 (100×110)/200 = 55
Skips Breakfast (100×90)/200 = 45 (100×110)/200 = 55

The Chi-Square Calculation

Cell O E (O-E)²/E
Breakfast + Good 60 45 5.00
Breakfast + Avg 40 55 4.09
Skip + Good 30 45 5.00
Skip + Avg 70 55 4.09
χ² = 18.18

Degrees of Freedom = (rows - 1) × (columns - 1) = 1 × 1 = 1

This χ² is very high! Breakfast and grades ARE connected!


🎭 Chi-Square Test of Homogeneity

The Big Question

“Do different GROUPS have the same pattern?”

The Different Schools Story 🏫

You want to know: Do students from three different schools have the same favorite subjects?

Example Data

Subject School A School B School C Total
Math 30 25 35 90
Science 20 30 20 70
Art 50 45 45 140
Total 100 100 100 300

Question: Are the preferences HOMOGENEOUS (the same) across schools?

How It Works

graph TD A["Chi-Square Homogeneity"] --> B["Compare Groups"] B --> C["School A"] B --> D["School B"] B --> E["School C"] F["Same Question"] --> G["Do they have the same distribution?"] C --> G D --> G E --> G

The Process:

  1. Calculate expected values (same formula as independence)
  2. Calculate χ²
  3. Find degrees of freedom: (rows - 1) × (columns - 1)
  4. Compare to critical value

Independence vs Homogeneity

Independence Homogeneity
ONE sample MULTIPLE samples
Are X and Y related? Do groups have same pattern?
Same math, different question! Same math, different question!

🎯 Expected and Observed Values

The Heart of Chi-Square

Everything comes down to two numbers:

Observed Values (O)

What you actually counted. Real data. The truth of what happened.

Example: You surveyed 50 people about their favorite pizza:

  • Pepperoni: 22 people ← This is OBSERVED
  • Cheese: 18 people ← This is OBSERVED
  • Veggie: 10 people ← This is OBSERVED

Expected Values (E)

What you PREDICTED would happen IF your theory is true.

Two Ways to Calculate Expected:

1. For Goodness of Fit:

Expected = Total × Probability

If you expect equal preference: 50 ÷ 3 = 16.67 each

2. For Independence/Homogeneity:

Expected = (Row Total × Column Total) / Grand Total

The Detective’s Comparison 🔍

graph TD A["Observed Values"] --> C["Compare"] B["Expected Values"] --> C C --> D{Big Difference?} D -->|Yes| E["Something interesting!"] D -->|No| F["Just random chance"]

Visual Example

Dice Roll Test (60 rolls):

Number Expected Observed Difference
1 10 12 +2
2 10 8 -2
3 10 11 +1
4 10 9 -1
5 10 7 -3
6 10 13 +3

Small differences = Fair die (probably!) Huge differences = Suspicious die! 🎲


🧠 Quick Summary

The Four Chi-Square Tests

Test Question Example
Distribution What does the χ² curve look like? Understanding probabilities
Goodness of Fit Does data match expected pattern? Is this die fair?
Independence Are two variables connected? Do phone users prefer certain apps?
Homogeneity Do groups have same distribution? Do cities have same voting patterns?

The Golden Formula

χ² = Σ (O - E)² / E

Remember:

  • O = What you SAW (observed)
  • E = What you EXPECTED
  • Bigger χ² = Bigger surprise = Something interesting!

🎮 You’ve Got This!

Chi-Square tests are like being a data detective:

  1. Make a prediction (expected values)
  2. Collect evidence (observed values)
  3. Compare them (calculate χ²)
  4. Solve the mystery (is the difference real or just chance?)

Next time you wonder “Is this just coincidence?” — you now have the tools to find out! 🔍✨

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.