🩹 Handling Missing Data in Pandas

The Story of the Forgetful Librarian

Imagine a librarian named Lily who keeps a record of all the books in her library. But Lily has a tiny problem—sometimes she forgets to write things down! Some book entries are missing their page counts, others are missing their authors.

Missing data in Pandas is exactly like Lily’s forgetful notes. And today, we’ll learn how to find, remove, or fill in those blank spots!

🔍 Detecting Missing Values

What Does “Missing” Look Like?

In Pandas, a missing value shows up as NaN (Not a Number) or None. Think of it like an empty box on a form—you know something should be there, but it’s blank.

import pandas as pd
import numpy as np

# Lily's book log with missing data
books = pd.DataFrame({
    'title': ['Python Basics', 'Data Science', 'AI Magic'],
    'pages': [200, np.nan, 350],
    'author': ['Ada', None, 'Grace']
})
print(books)

Output:

           title  pages author
0  Python Basics  200.0    Ada
1   Data Science    NaN   None
2       AI Magic  350.0  Grace

See those NaN and None? Those are Lily’s forgotten entries!

Finding the Blanks with `isna()` and `isnull()`

To find where the blanks are, use isna() or isnull() (they do the same thing!).

# True = missing, False = not missing
print(books.isna())

Output:

   title  pages  author
0  False  False   False
1  False   True    True
2  False  False   False

Row 1 has two missing values—pages and author!

Finding Non-Missing with `notna()`

Want to find what’s not missing? Use notna():

print(books.notna())

This flips the True/False—now True means “we have data here!”

🏷️ NA and pd.NA

Meet the New Kid: `pd.NA`

Pandas introduced pd.NA as a better way to represent missing data. It works with all data types—numbers, text, booleans, everything!

# Old way
old_missing = np.nan

# New way (cleaner!)
new_missing = pd.NA

Why pd.NA is Better

Imagine asking: “Is this missing value True or False?” With np.nan, you’d get confusing answers. With pd.NA, you get pd.NA—meaning “I don’t know, it’s missing!”

# pd.NA handles logic better
result = pd.NA | True   # Returns True
result = pd.NA & False  # Returns False
result = pd.NA | False  # Returns pd.NA (uncertain!)

Think of pd.NA as an honest friend who says “I don’t know” instead of guessing!

🗑️ Dropping Missing with `dropna()`

Sometimes, you just want to remove the rows or columns with blanks. That’s what dropna() does!

Drop Rows with Any Missing Value

# Remove any row that has a blank
clean_books = books.dropna()
print(clean_books)

Output:

           title  pages author
0  Python Basics  200.0    Ada
2       AI Magic  350.0  Grace

Row 1 had blanks, so it’s gone!

Drop Only If All Values Are Missing

# Only drop if ENTIRE row is blank
books.dropna(how='all')

Drop Rows Based on Specific Columns

# Only check 'pages' column for blanks
books.dropna(subset=['pages'])

Drop Columns Instead of Rows

# axis=1 means columns
books.dropna(axis=1)

graph TD
    A[DataFrame with NaN] --> B{dropna}
    B -->|how='any'| C[Remove if ANY blank]
    B -->|how='all'| D[Remove if ALL blank]
    B -->|subset| E[Check specific columns]
    B -->|axis=1| F[Remove columns, not rows]

✏️ Filling Missing with `fillna()`

Instead of removing blanks, what if we fill them in? Like Lily finally remembering and writing down the missing info!

Fill with a Single Value

# Fill all blanks with 0
books['pages'].fillna(0)

Output:

0    200.0
1      0.0
2    350.0

Fill with the Mean (Average)

# Fill with average page count
avg_pages = books['pages'].mean()
books['pages'].fillna(avg_pages)

Fill Different Columns with Different Values

books.fillna({
    'pages': 0,
    'author': 'Unknown'
})

Now “Unknown” appears where author was missing!

⬆️⬇️ Directional Fill Methods

What if you want to fill blanks using nearby values? Like copying from the cell above or below!

Forward Fill (ffill) - Copy from Above

temps = pd.Series([22, np.nan, np.nan, 25, np.nan])
temps.ffill()

Output:

0    22.0
1    22.0  ← copied from row 0
2    22.0  ← copied from row 1
3    25.0
4    25.0  ← copied from row 3

The blank looks “up” and copies!

Backward Fill (bfill) - Copy from Below

temps.bfill()

Output:

0    22.0
1    25.0  ← copied from row 3
2    25.0  ← copied from row 3
3    25.0
4     NaN  ← nothing below to copy!

The blank looks “down” and copies!

Limit How Many to Fill

# Only fill 1 blank in a row
temps.ffill(limit=1)

graph TD
    A[Blank Cell] --> B{Which direction?}
    B -->|ffill| C[Look UP and copy]
    B -->|bfill| D[Look DOWN and copy]
    C --> E[Fill blanks forward]
    D --> F[Fill blanks backward]

📈 Interpolating Missing Values

Interpolation is like being a detective. If you know the values before and after a blank, you can guess what’s in the middle!

Linear Interpolation

Imagine a line connecting two points—the missing value is somewhere on that line.

heights = pd.Series([100, np.nan, np.nan, 160])
heights.interpolate()

Output:

0    100.0
1    120.0  ← guessed! (100 + 160) / 3 steps
2    140.0  ← guessed!
3    160.0

The gaps are filled with evenly spaced values!

Different Interpolation Methods

# Time-based interpolation
df.interpolate(method='time')

# Polynomial interpolation (curved line)
df.interpolate(method='polynomial', order=2)

# Index-based (uses actual index values)
df.interpolate(method='index')

When to Use Interpolation

Situation	Best Method
Steady growth	`linear`
Time series data	`time`
Curved patterns	`polynomial`
Index matters	`index`

🎯 Quick Decision Guide

graph TD
    A[Missing Data Found!] --> B{What to do?}
    B -->|Remove it| C[dropna]
    B -->|Fill with value| D[fillna]
    B -->|Copy neighbors| E[ffill/bfill]
    B -->|Smart guess| F[interpolate]
    C --> G[Rows or Columns?]
    D --> H[Single value or dict?]
    E --> I[Forward or Backward?]
    F --> J[Linear or Polynomial?]

💡 Pro Tips

Check first! Always use isna().sum() to count blanks before deciding what to do.
Don’t blindly fill! Filling with 0 might mess up calculations. Think about what makes sense for your data.
Interpolation = Smart fill. For time-based data (like temperatures or stock prices), interpolation gives better results than simple filling.
pd.NA is the future. When creating DataFrames from scratch, prefer pd.NA over np.nan.

🏆 You Did It!

You’ve learned how to:

✅ Detect missing values with isna() and notna()
✅ Understand pd.NA vs np.nan
✅ Drop missing data with dropna()
✅ Fill blanks with fillna()
✅ Use directional fills (ffill, bfill)
✅ Interpolate to make smart guesses

Lily the librarian is now organized, and so is your data! 📚✨

Loading story...

No Story Available

This concept doesn't have a story yet.

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Quiz Available

This concept doesn't have a quiz yet.

Unable to load concept

Coming Soon...

🩹 Handling Missing Data in Pandas

The Story of the Forgetful Librarian

🔍 Detecting Missing Values

What Does “Missing” Look Like?

Finding the Blanks with `isna()` and `isnull()`

Finding Non-Missing with `notna()`

🏷️ NA and pd.NA

Meet the New Kid: `pd.NA`

Why pd.NA is Better

🗑️ Dropping Missing with `dropna()`

Drop Rows with Any Missing Value

Drop Only If All Values Are Missing

Drop Rows Based on Specific Columns

Drop Columns Instead of Rows

✏️ Filling Missing with `fillna()`

Fill with a Single Value

Fill with the Mean (Average)

Fill Different Columns with Different Values

⬆️⬇️ Directional Fill Methods

Forward Fill (ffill) - Copy from Above

Backward Fill (bfill) - Copy from Below

Limit How Many to Fill

📈 Interpolating Missing Values

Linear Interpolation

Different Interpolation Methods

When to Use Interpolation

🎯 Quick Decision Guide

💡 Pro Tips

🏆 You Did It!

No Story Available

Story - Premium Content

Interactive - Premium Content

No Interactive Content

Cheatsheet - Premium Content

No Cheatsheet Available

Quiz - Premium Content

No Quiz Available

Report an Issue

Handling Missing Data

Unable to load concept

Coming Soon...

🩹 Handling Missing Data in Pandas

The Story of the Forgetful Librarian

🔍 Detecting Missing Values

What Does “Missing” Look Like?

Finding the Blanks with isna() and isnull()

Finding Non-Missing with notna()

🏷️ NA and pd.NA

Meet the New Kid: pd.NA

Why pd.NA is Better

🗑️ Dropping Missing with dropna()

Drop Rows with Any Missing Value

Drop Only If All Values Are Missing

Drop Rows Based on Specific Columns

Drop Columns Instead of Rows

✏️ Filling Missing with fillna()

Fill with a Single Value

Fill with the Mean (Average)

Fill Different Columns with Different Values

⬆️⬇️ Directional Fill Methods

Forward Fill (ffill) - Copy from Above

Backward Fill (bfill) - Copy from Below

Limit How Many to Fill

📈 Interpolating Missing Values

Linear Interpolation

Different Interpolation Methods

When to Use Interpolation

🎯 Quick Decision Guide

💡 Pro Tips

🏆 You Did It!

No Story Available

Story - Premium Content

Interactive - Premium Content

No Interactive Content

Cheatsheet - Premium Content

No Cheatsheet Available

Quiz - Premium Content

No Quiz Available

Report an Issue

Finding the Blanks with `isna()` and `isnull()`

Finding Non-Missing with `notna()`

Meet the New Kid: `pd.NA`

🗑️ Dropping Missing with `dropna()`

✏️ Filling Missing with `fillna()`