What is the difference between merge and join in Pandas?

Merge combines DataFrames using column values as keys. Join uses the index (row labels) instead. Use merge() for columns, join() for indexes.

How do pivot and melt work in Pandas?

Pivot transforms long format data to wide format (like opening an umbrella). Melt does the opposite, converting wide to long format.

How do you remove duplicates in Pandas?

Use drop_duplicates() to remove duplicate rows. Use keep='first' or 'last' to specify which copy to keep, or keep=False to remove all.

Pandas Advanced Operations | Data Analytics

🐼 Pandas Advanced Operations: The Data Kitchen

Imagine you’re a master chef in a magical kitchen. Your ingredients? Data tables! Today, we’ll learn how to mix, match, clean, and transform them like a pro!

🍳 The Big Picture

Think of Pandas DataFrames like recipe cards in your kitchen. Sometimes you need to:

Combine two recipe cards into one (Merge & Join)
Stack multiple cards together (Concatenation)
Flip the card to see it differently (Pivot & Unpivot)
Apply a special sauce to every ingredient (Apply & Map)
Clean up messy handwriting (Data Cleaning)
Remove duplicate cards (Duplicate Detection)

Let’s cook! 🍴

1. 🔗 Merge and Join: Combining Recipe Cards

What’s the Story?

You have two recipe cards:

Card A: Lists ingredients with their prices
Card B: Lists ingredients with their calories

You want ONE card with prices AND calories together!

graph TD
    A["Card A: Ingredients + Prices"] --> C["Combined Card"]
    B["Card B: Ingredients + Calories"] --> C
    C --> D["Ingredients + Prices + Calories"]

The Magic Spell

# Two DataFrames
prices = pd.DataFrame({
    'item': ['apple', 'banana'],
    'price': [1.0, 0.5]
})

calories = pd.DataFrame({
    'item': ['apple', 'banana'],
    'cal': [95, 105]
})

# Merge them!
combined = pd.merge(
    prices,
    calories,
    on='item'
)

Result:

item	price	cal
apple	1.0	95
banana	0.5	105

Types of Merges (Like Venn Diagrams!)

Type	What It Does	Real Life
`inner`	Only matching items	Friends in BOTH clubs
`left`	All from left + matches	Your list + friend’s notes
`right`	All from right + matches	Friend’s list + your notes
`outer`	Everything from both	All friends combined

# Left merge example
pd.merge(A, B, on='key', how='left')

Join: The Index-Based Cousin

join() is like merge() but uses the index (row labels) instead of a column.

# Join using index
df1.join(df2, how='inner')

💡 Simple Rule: Use merge() for columns, join() for indexes!

2. 📚 Concatenation: Stacking Cards

What’s the Story?

Imagine you have recipe cards from Monday and Tuesday. You want to stack them into one big pile!

graph TD
    A["Monday Recipes"] --> C["All Recipes"]
    B["Tuesday Recipes"] --> C

The Magic Spell

monday = pd.DataFrame({
    'dish': ['pasta', 'salad']
})

tuesday = pd.DataFrame({
    'dish': ['soup', 'pizza']
})

# Stack vertically (like a pile)
all_dishes = pd.concat(
    [monday, tuesday]
)

Result:

dish
pasta
salad
soup
pizza

Two Directions

Direction	Code	Like…
Vertical (rows)	`axis=0`	Stacking papers
Horizontal (columns)	`axis=1`	Gluing side-by-side

# Side by side
pd.concat([df1, df2], axis=1)

💡 Pro Tip: Use ignore_index=True to get fresh row numbers!

3. 🔄 Pivot and Unpivot: Flipping the Card

What’s the Story?

Your recipe card shows data in a long list. But you want to see it as a neat table instead!

Pivot: Long → Wide

Think of it like unfolding a paper crane back into a flat sheet.

# Long format data
sales = pd.DataFrame({
    'day': ['Mon', 'Mon', 'Tue'],
    'fruit': ['apple', 'banana', 'apple'],
    'qty': [10, 15, 8]
})

# Pivot to wide format
wide = sales.pivot(
    index='day',
    columns='fruit',
    values='qty'
)

Before (Long):

day	fruit	qty
Mon	apple	10
Mon	banana	15
Tue	apple	8

After (Wide):

day	apple	banana
Mon	10	15
Tue	8	NaN

Unpivot (Melt): Wide → Long

The opposite! Like folding paper into a crane.

# Melt back to long format
long = wide.melt(
    ignore_index=False,
    var_name='fruit',
    value_name='qty'
)

💡 Remember:

Pivot = Spread out (like opening an umbrella)
Melt = Gather up (like closing an umbrella)

4. 🎨 Apply and Map: The Special Sauce

What’s the Story?

You want to add the same sauce to EVERY dish. Or maybe different sauces based on the dish type!

Apply: Do Something to Each Row/Column

# Double every price
df['price'] = df['price'].apply(
    lambda x: x * 2
)

# Or use a regular function
def add_tax(price):
    return price * 1.1

df['final'] = df['price'].apply(add_tax)

Map: Replace Values Like Magic

# Map codes to names
grade_map = {
    'A': 'Excellent',
    'B': 'Good',
    'C': 'Okay'
}

df['grade_name'] = df['grade'].map(
    grade_map
)

Apply to Whole DataFrame

# Apply to each column
df.apply(lambda col: col.max())

# Apply to each row
df.apply(
    lambda row: row['a'] + row['b'],
    axis=1
)

💡 Quick Guide:

Function	Use When
`apply()`	Custom function on each element/row
`map()`	Replace values using a dictionary
`applymap()`	Apply function to EVERY cell

5. 🧹 Data Cleaning: Fixing Messy Handwriting

What’s the Story?

Your recipe card has messy handwriting:

Some words are missing
Some are spelled wrong
Some have extra spaces

Time to clean up!

Handling Missing Values

# Find missing values
df.isna().sum()

# Fill missing with a value
df['col'].fillna(0)

# Fill with the average
df['col'].fillna(df['col'].mean())

# Drop rows with missing data
df.dropna()

Fixing Text Data

# Remove extra spaces
df['name'] = df['name'].str.strip()

# Make lowercase
df['name'] = df['name'].str.lower()

# Replace values
df['name'] = df['name'].str.replace(
    'old', 'new'
)

Fixing Data Types

# Convert to number
df['age'] = pd.to_numeric(
    df['age'],
    errors='coerce'
)

# Convert to date
df['date'] = pd.to_datetime(
    df['date']
)

💡 Cleaning Checklist:

✅ Check for NaN values
✅ Strip whitespace
✅ Standardize text case
✅ Fix data types
✅ Handle outliers

6. 🔍 Duplicate Detection & Removal

What’s the Story?

Oh no! Someone copied the same recipe card twice. Let’s find and remove the copies!

Finding Duplicates

# Check for duplicates
df.duplicated()

# See which rows are duplicates
df[df.duplicated()]

# Count duplicates
df.duplicated().sum()

Removing Duplicates

# Remove all duplicates
clean_df = df.drop_duplicates()

# Keep first occurrence
df.drop_duplicates(keep='first')

# Keep last occurrence
df.drop_duplicates(keep='last')

# Check specific columns only
df.drop_duplicates(
    subset=['name', 'date']
)

graph TD
    A["Original Data"] --> B{Duplicates?}
    B -->|Yes| C["drop_duplicates"]
    C --> D["Clean Data"]
    B -->|No| D

💡 Duplicate Strategy:

Situation	Use
Keep first entry	`keep='first'`
Keep last entry	`keep='last'`
Remove all copies	`keep=False`

🎓 Quick Summary

Operation	What It Does	Main Function
Merge/Join	Combine tables by key	`pd.merge()`
Concat	Stack tables	`pd.concat()`
Pivot	Long → Wide	`df.pivot()`
Melt	Wide → Long	`df.melt()`
Apply	Custom function	`df.apply()`
Map	Replace values	`df.map()`
Clean	Fix messy data	`fillna()`, `strip()`
Dedupe	Remove copies	`drop_duplicates()`

🚀 You Did It!

You’ve mastered the Pandas kitchen! Now you can:

✅ Combine data from different sources
✅ Reshape data any way you want
✅ Apply transformations like a pro
✅ Clean up messy datasets
✅ Remove pesky duplicates

Remember: Data is like ingredients. With these tools, you can cook up anything! 🍳🎉

Blimto

Pandas Advanced Operations

Unable to load concept

Coming Soon...

🐼 Pandas Advanced Operations: The Data Kitchen

🍳 The Big Picture

1. 🔗 Merge and Join: Combining Recipe Cards

What’s the Story?

The Magic Spell

Types of Merges (Like Venn Diagrams!)

Join: The Index-Based Cousin

2. 📚 Concatenation: Stacking Cards

What’s the Story?

The Magic Spell

Two Directions

3. 🔄 Pivot and Unpivot: Flipping the Card

What’s the Story?

Pivot: Long → Wide

Unpivot (Melt): Wide → Long

4. 🎨 Apply and Map: The Special Sauce

What’s the Story?

Apply: Do Something to Each Row/Column

Map: Replace Values Like Magic

Apply to Whole DataFrame

5. 🧹 Data Cleaning: Fixing Messy Handwriting

What’s the Story?

Handling Missing Values

Fixing Text Data

Fixing Data Types

6. 🔍 Duplicate Detection & Removal

What’s the Story?

Finding Duplicates

Removing Duplicates

🎓 Quick Summary

🚀 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactives - Premium Content

Interactives - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcards - Premium Content

Flashcards - Premium Content

Stay Tuned!

Sign in Required

Report an Issue