🐼 Pandas Advanced Operations: The Data Kitchen
Imagine you’re a master chef in a magical kitchen. Your ingredients? Data tables! Today, we’ll learn how to mix, match, clean, and transform them like a pro!
🍳 The Big Picture
Think of Pandas DataFrames like recipe cards in your kitchen. Sometimes you need to:
- Combine two recipe cards into one (Merge & Join)
- Stack multiple cards together (Concatenation)
- Flip the card to see it differently (Pivot & Unpivot)
- Apply a special sauce to every ingredient (Apply & Map)
- Clean up messy handwriting (Data Cleaning)
- Remove duplicate cards (Duplicate Detection)
Let’s cook! 🍴
1. 🔗 Merge and Join: Combining Recipe Cards
What’s the Story?
You have two recipe cards:
- Card A: Lists ingredients with their prices
- Card B: Lists ingredients with their calories
You want ONE card with prices AND calories together!
graph TD A["Card A: Ingredients + Prices"] --> C["Combined Card"] B["Card B: Ingredients + Calories"] --> C C --> D["Ingredients + Prices + Calories"]
The Magic Spell
# Two DataFrames
prices = pd.DataFrame({
'item': ['apple', 'banana'],
'price': [1.0, 0.5]
})
calories = pd.DataFrame({
'item': ['apple', 'banana'],
'cal': [95, 105]
})
# Merge them!
combined = pd.merge(
prices,
calories,
on='item'
)
Result:
| item | price | cal |
|---|---|---|
| apple | 1.0 | 95 |
| banana | 0.5 | 105 |
Types of Merges (Like Venn Diagrams!)
| Type | What It Does | Real Life |
|---|---|---|
inner |
Only matching items | Friends in BOTH clubs |
left |
All from left + matches | Your list + friend’s notes |
right |
All from right + matches | Friend’s list + your notes |
outer |
Everything from both | All friends combined |
# Left merge example
pd.merge(A, B, on='key', how='left')
Join: The Index-Based Cousin
join() is like merge() but uses the index (row labels) instead of a column.
# Join using index
df1.join(df2, how='inner')
💡 Simple Rule: Use merge() for columns, join() for indexes!
2. 📚 Concatenation: Stacking Cards
What’s the Story?
Imagine you have recipe cards from Monday and Tuesday. You want to stack them into one big pile!
graph TD A["Monday Recipes"] --> C["All Recipes"] B["Tuesday Recipes"] --> C
The Magic Spell
monday = pd.DataFrame({
'dish': ['pasta', 'salad']
})
tuesday = pd.DataFrame({
'dish': ['soup', 'pizza']
})
# Stack vertically (like a pile)
all_dishes = pd.concat(
[monday, tuesday]
)
Result:
| dish |
|---|
| pasta |
| salad |
| soup |
| pizza |
Two Directions
| Direction | Code | Like… |
|---|---|---|
| Vertical (rows) | axis=0 |
Stacking papers |
| Horizontal (columns) | axis=1 |
Gluing side-by-side |
# Side by side
pd.concat([df1, df2], axis=1)
💡 Pro Tip: Use ignore_index=True to get fresh row numbers!
3. 🔄 Pivot and Unpivot: Flipping the Card
What’s the Story?
Your recipe card shows data in a long list. But you want to see it as a neat table instead!
Pivot: Long → Wide
Think of it like unfolding a paper crane back into a flat sheet.
# Long format data
sales = pd.DataFrame({
'day': ['Mon', 'Mon', 'Tue'],
'fruit': ['apple', 'banana', 'apple'],
'qty': [10, 15, 8]
})
# Pivot to wide format
wide = sales.pivot(
index='day',
columns='fruit',
values='qty'
)
Before (Long):
| day | fruit | qty |
|---|---|---|
| Mon | apple | 10 |
| Mon | banana | 15 |
| Tue | apple | 8 |
After (Wide):
| day | apple | banana |
|---|---|---|
| Mon | 10 | 15 |
| Tue | 8 | NaN |
Unpivot (Melt): Wide → Long
The opposite! Like folding paper into a crane.
# Melt back to long format
long = wide.melt(
ignore_index=False,
var_name='fruit',
value_name='qty'
)
💡 Remember:
- Pivot = Spread out (like opening an umbrella)
- Melt = Gather up (like closing an umbrella)
4. 🎨 Apply and Map: The Special Sauce
What’s the Story?
You want to add the same sauce to EVERY dish. Or maybe different sauces based on the dish type!
Apply: Do Something to Each Row/Column
# Double every price
df['price'] = df['price'].apply(
lambda x: x * 2
)
# Or use a regular function
def add_tax(price):
return price * 1.1
df['final'] = df['price'].apply(add_tax)
Map: Replace Values Like Magic
# Map codes to names
grade_map = {
'A': 'Excellent',
'B': 'Good',
'C': 'Okay'
}
df['grade_name'] = df['grade'].map(
grade_map
)
Apply to Whole DataFrame
# Apply to each column
df.apply(lambda col: col.max())
# Apply to each row
df.apply(
lambda row: row['a'] + row['b'],
axis=1
)
💡 Quick Guide:
| Function | Use When |
|---|---|
apply() |
Custom function on each element/row |
map() |
Replace values using a dictionary |
applymap() |
Apply function to EVERY cell |
5. 🧹 Data Cleaning: Fixing Messy Handwriting
What’s the Story?
Your recipe card has messy handwriting:
- Some words are missing
- Some are spelled wrong
- Some have extra spaces
Time to clean up!
Handling Missing Values
# Find missing values
df.isna().sum()
# Fill missing with a value
df['col'].fillna(0)
# Fill with the average
df['col'].fillna(df['col'].mean())
# Drop rows with missing data
df.dropna()
Fixing Text Data
# Remove extra spaces
df['name'] = df['name'].str.strip()
# Make lowercase
df['name'] = df['name'].str.lower()
# Replace values
df['name'] = df['name'].str.replace(
'old', 'new'
)
Fixing Data Types
# Convert to number
df['age'] = pd.to_numeric(
df['age'],
errors='coerce'
)
# Convert to date
df['date'] = pd.to_datetime(
df['date']
)
💡 Cleaning Checklist:
- ✅ Check for
NaNvalues - ✅ Strip whitespace
- ✅ Standardize text case
- ✅ Fix data types
- ✅ Handle outliers
6. 🔍 Duplicate Detection & Removal
What’s the Story?
Oh no! Someone copied the same recipe card twice. Let’s find and remove the copies!
Finding Duplicates
# Check for duplicates
df.duplicated()
# See which rows are duplicates
df[df.duplicated()]
# Count duplicates
df.duplicated().sum()
Removing Duplicates
# Remove all duplicates
clean_df = df.drop_duplicates()
# Keep first occurrence
df.drop_duplicates(keep='first')
# Keep last occurrence
df.drop_duplicates(keep='last')
# Check specific columns only
df.drop_duplicates(
subset=['name', 'date']
)
graph TD A["Original Data"] --> B{Duplicates?} B -->|Yes| C["drop_duplicates"] C --> D["Clean Data"] B -->|No| D
💡 Duplicate Strategy:
| Situation | Use |
|---|---|
| Keep first entry | keep='first' |
| Keep last entry | keep='last' |
| Remove all copies | keep=False |
🎓 Quick Summary
| Operation | What It Does | Main Function |
|---|---|---|
| Merge/Join | Combine tables by key | pd.merge() |
| Concat | Stack tables | pd.concat() |
| Pivot | Long → Wide | df.pivot() |
| Melt | Wide → Long | df.melt() |
| Apply | Custom function | df.apply() |
| Map | Replace values | df.map() |
| Clean | Fix messy data | fillna(), strip() |
| Dedupe | Remove copies | drop_duplicates() |
🚀 You Did It!
You’ve mastered the Pandas kitchen! Now you can:
- ✅ Combine data from different sources
- ✅ Reshape data any way you want
- ✅ Apply transformations like a pro
- ✅ Clean up messy datasets
- ✅ Remove pesky duplicates
Remember: Data is like ingredients. With these tools, you can cook up anything! 🍳🎉
