🧹 Data Ordering & Cleaning in Pandas

The Messy Toy Box Story

Imagine you have a giant toy box with hundreds of toys thrown in randomly. Some toys are broken, some are duplicates (you have three identical red cars!), and everything is jumbled up.

Pandas is like having a magical cleaning robot that helps you:

Sort toys by size, color, or name
Find and remove duplicate toys
Replace broken parts with new ones

Let’s learn how to tidy up our data toy box!

🎯 Sorting by Values

Think of it like: Arranging books on a shelf by their titles.

When you sort by values, you pick a column and arrange all rows based on that column’s data.

import pandas as pd

# Our messy data
df = pd.DataFrame({
    'name': ['Zara', 'Anna', 'Mike'],
    'age': [25, 30, 22]
})

# Sort by age (smallest to biggest)
sorted_df = df.sort_values('age')

Result:

name	age
Mike	22
Anna	30
Zara	25

Wait… that’s wrong! Let me fix:

name	age
Mike	22
Zara	25
Anna	30

✨ Now ages go: 22 → 25 → 30

🔢 Sorting by Index

Think of it like: Arranging your music playlist by track number.

Every DataFrame has an invisible “row number” called the index. Sorting by index puts rows back in their original order.

# After some shuffling, reset order
df_sorted = df.sort_index()

When to use:

After filtering, rows might be out of order
You want to restore original sequence
Working with time-series data

graph TD
    A[Shuffled Data<br/>Index: 3, 1, 2] --> B[sort_index]
    B --> C[Ordered Data<br/>Index: 1, 2, 3]

📚 Sorting by Multiple Columns

Think of it like: Organizing a library first by genre, then by author name.

Sometimes one column isn’t enough. You want to sort by Country first, then by City within each country.

df = pd.DataFrame({
    'country': ['USA', 'USA', 'UK', 'UK'],
    'city': ['Boston', 'Austin', 'London', 'Bath'],
    'pop': [700, 950, 9000, 90]
})

# Sort by country, then city
df.sort_values(['country', 'city'])

Result:

country	city	pop
UK	Bath	90
UK	London	9000
USA	Austin	950
USA	Boston	700

📝 First sorted by country (UK before USA), then cities within each country are alphabetical!

⬆️⬇️ Ascending and Descending Sort

Think of it like: Counting 1-2-3 (ascending) or 3-2-1 (descending).

Ascending = smallest to biggest (A→Z, 1→100)
Descending = biggest to smallest (Z→A, 100→1)

# Ascending (default) - low to high
df.sort_values('price', ascending=True)

# Descending - high to low
df.sort_values('price', ascending=False)

Pro tip for multiple columns:

# Country A→Z, but price high→low
df.sort_values(
    ['country', 'price'],
    ascending=[True, False]
)

graph TD
    A[Unsorted: 50, 10, 30] --> B{Direction?}
    B -->|ascending=True| C[10, 30, 50]
    B -->|ascending=False| D[50, 30, 10]

🏆 Ranking Data

Think of it like: Giving medals in a race — 1st place, 2nd place, 3rd place.

Ranking assigns a position number based on values. Unlike sorting, it keeps rows in place but adds a rank column.

df = pd.DataFrame({
    'student': ['Ali', 'Bob', 'Cat'],
    'score': [85, 92, 85]
})

# Add rank (highest score = rank 1)
df['rank'] = df['score'].rank(
    ascending=False
)

Result:

student	score	rank
Ali	85	2.5
Bob	92	1.0
Cat	85	2.5

Wait, why 2.5?

Ali and Cat tied! By default, Pandas gives them the average of ranks 2 and 3 = 2.5

Other ranking methods:

method='min' → Both get rank 2
method='max' → Both get rank 3
method='first' → First one gets 2, second gets 3

🔍 Finding Duplicates

Think of it like: Spotting twin toys in your toy box.

Before removing duplicates, you need to find them first!

df = pd.DataFrame({
    'fruit': ['apple', 'banana', 'apple'],
    'price': [1.0, 0.5, 1.0]
})

# Find ALL duplicate rows
df.duplicated()

Result:

0    False  ← First apple (original)
1    False  ← Banana (unique)
2     True  ← Second apple (duplicate!)

Check specific columns:

# Only check 'fruit' column
df.duplicated(subset=['fruit'])

Count total duplicates:

df.duplicated().sum()  # Returns: 1

🗑️ Removing Duplicates

Think of it like: Keeping only one of each twin toy.

Once found, duplicates are easy to remove!

# Remove duplicate rows
clean_df = df.drop_duplicates()

Control which duplicate to keep:

# Keep first occurrence (default)
df.drop_duplicates(keep='first')

# Keep last occurrence
df.drop_duplicates(keep='last')

# Remove ALL duplicates
df.drop_duplicates(keep=False)

Remove based on specific columns:

# Keep first row for each unique fruit
df.drop_duplicates(subset=['fruit'])

graph TD
    A[3 Rows<br/>apple, banana, apple] --> B[drop_duplicates]
    B --> C[2 Rows<br/>apple, banana]

🔄 Replacing Values

Think of it like: Swapping broken toy parts with new ones.

Sometimes data has wrong or messy values. Replace fixes them!

Simple replacement:

df = pd.DataFrame({
    'status': ['active', 'inactive', 'actve']
})

# Fix typo: 'actve' → 'active'
df['status'].replace('actve', 'active')

Multiple replacements:

# Replace several values at once
df.replace({
    'actve': 'active',
    'inactve': 'inactive'
})

Replace with dictionary mapping:

# Convert codes to labels
df['grade'].replace({
    'A': 'Excellent',
    'B': 'Good',
    'C': 'Average'
})

Replace in specific columns:

df.replace({
    'status': {'old': 'new'},
    'grade': {'F': 'Fail'}
})

🎁 Quick Reference Box

Task	Code
Sort by column	`df.sort_values('col')`
Sort by index	`df.sort_index()`
Sort multiple	`df.sort_values(['a','b'])`
Descending	`ascending=False`
Rank values	`df['col'].rank()`
Find duplicates	`df.duplicated()`
Remove duplicates	`df.drop_duplicates()`
Replace values	`df.replace('old','new')`

🌟 You Did It!

You now know how to:

✅ Sort data any way you want
✅ Rank items like a competition judge
✅ Hunt down sneaky duplicates
✅ Clean up messy values

Your data toy box is now perfectly organized! 🎉

Next time you see messy data, remember: Pandas is your magical cleaning robot. Just tell it what to sort, find, remove, or replace — and watch the magic happen!

Loading story...

No Story Available

This concept doesn't have a story yet.

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Sign In to Access Get Premium Access Close

No Quiz Available

This concept doesn't have a quiz yet.

Unable to load concept

Coming Soon...

🧹 Data Ordering & Cleaning in Pandas

The Messy Toy Box Story

🎯 Sorting by Values

🔢 Sorting by Index

📚 Sorting by Multiple Columns

⬆️⬇️ Ascending and Descending Sort

🏆 Ranking Data

🔍 Finding Duplicates

🗑️ Removing Duplicates

🔄 Replacing Values

Simple replacement:

Multiple replacements:

Replace with dictionary mapping:

Replace in specific columns:

🎁 Quick Reference Box

🌟 You Did It!

No Story Available

Story - Premium Content

Interactive - Premium Content

No Interactive Content

Cheatsheet - Premium Content

No Cheatsheet Available

Quiz - Premium Content

No Quiz Available

Report an Issue

Data Ordering and Cleaning

Unable to load concept

Coming Soon...

🧹 Data Ordering & Cleaning in Pandas

The Messy Toy Box Story

🎯 Sorting by Values

🔢 Sorting by Index

📚 Sorting by Multiple Columns

⬆️⬇️ Ascending and Descending Sort

🏆 Ranking Data

🔍 Finding Duplicates

🗑️ Removing Duplicates

🔄 Replacing Values

Simple replacement:

Multiple replacements:

Replace with dictionary mapping:

Replace in specific columns:

🎁 Quick Reference Box

🌟 You Did It!

No Story Available

Story - Premium Content

Interactive - Premium Content

No Interactive Content

Cheatsheet - Premium Content

No Cheatsheet Available

Quiz - Premium Content

No Quiz Available

Report an Issue