What are string operations in Pandas?

String operations let you cut, change, and manipulate text data. Use the .str accessor to apply methods like lower(), upper(), strip(), and replace() to entire columns.

What is groupby in Pandas?

Groupby sorts data into separate groups based on a column value, like sorting marbles by color. Then you can calculate sums, averages, or counts for each group.

What's the difference between wide and long data?

Wide data spreads values across columns (like a calendar). Long data stacks values in rows (like a receipt). Use melt() and pivot() to convert between them.

Pandas Operations | Data Analytics Guide

🐼 Pandas Operations: Your Data Kitchen Adventure!

Imagine you’re a chef in a magical kitchen. Your ingredients? Data! And Pandas is your super-smart kitchen assistant that helps you chop, mix, sort, and transform ingredients into delicious insights.

Today, we’ll learn five powerful kitchen tricks:

🔤 String Operations (chopping and slicing text)
📅 DateTime Operations (working with time)
👥 Groupby Operations (sorting into baskets)
📊 Aggregation (counting and measuring)
↔️ Wide vs Long Data (reshaping your table)

🔤 String Operations in Pandas

What Are String Operations?

Think of text as a piece of string (like yarn!). String operations let you cut, tie, stretch, and change that string however you want.

Simple Example:

You have a list of names: “JOHN”, “mary”, “Bob”
You want them all to look the same: “John”, “Mary”, “Bob”
Pandas helps you fix them all at once!

The Magic `.str` Accessor

In Pandas, we use .str to work with text. It’s like saying “Hey Pandas, treat this as text!”

import pandas as pd

# Our messy data
names = pd.Series(['JOHN', 'mary', 'Bob'])

# Make all lowercase
names.str.lower()
# Result: ['john', 'mary', 'bob']

# Make all uppercase
names.str.upper()
# Result: ['JOHN', 'MARY', 'BOB']

# Proper case (first letter big)
names.str.title()
# Result: ['John', 'Mary', 'Bob']

Common String Tricks

Trick	What It Does	Example
`.str.lower()`	All small letters	“HELLO” → “hello”
`.str.upper()`	All BIG letters	“hello” → “HELLO”
`.str.strip()`	Remove extra spaces	" hi " → “hi”
`.str.replace()`	Swap one thing for another	“cat” → “dog”
`.str.contains()`	Check if text has something	“hello” has “ell”? ✅
`.str.len()`	Count characters	“hello” = 5
`.str.split()`	Break text apart	“a-b-c” → [‘a’,‘b’,‘c’]

Real Kitchen Example

# Messy ingredient list
ingredients = pd.Series([
    '  TOMATOES  ',
    'onions',
    'GARLIC cloves'
])

# Clean them up!
clean = ingredients.str.strip()
clean = clean.str.lower()
clean = clean.str.title()

# Result: ['Tomatoes', 'Onions',
#          'Garlic Cloves']

Finding Things in Text

emails = pd.Series([
    'john@gmail.com',
    'mary@yahoo.com',
    'bob@gmail.com'
])

# Find all Gmail users
gmail_users = emails.str.contains('gmail')
# Result: [True, False, True]

# Extract the domain
domains = emails.str.split('@').str[1]
# Result: ['gmail.com', 'yahoo.com',
#          'gmail.com']

📅 DateTime Operations

What Are DateTime Operations?

Time is tricky! Is “01/02/2024” January 2nd or February 1st? Pandas helps you work with dates and times without confusion.

Simple Example:

You have birthdays as text: “1990-05-15”
Pandas turns them into real dates you can calculate with!
“How old is this person?” becomes easy!

Converting to DateTime

import pandas as pd

# Text dates
dates_text = pd.Series([
    '2024-01-15',
    '2024-06-20',
    '2024-12-25'
])

# Convert to real dates
dates = pd.to_datetime(dates_text)

Extracting Parts of a Date

Like taking apart a clock to see the gears!

# Create a date column
df = pd.DataFrame({
    'event': ['Birthday', 'Holiday', 'Meeting'],
    'date': pd.to_datetime([
        '2024-03-15',
        '2024-12-25',
        '2024-07-04'
    ])
})

# Extract parts using .dt accessor
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day
df['weekday'] = df['date'].dt.day_name()

Result:

event	year	month	day	weekday
Birthday	2024	3	15	Friday
Holiday	2024	12	25	Wednesday
Meeting	2024	7	4	Thursday

Date Math

from datetime import timedelta

# Today's date
today = pd.Timestamp('2024-06-15')

# Add 7 days
next_week = today + timedelta(days=7)
# Result: 2024-06-22

# Difference between dates
date1 = pd.Timestamp('2024-01-01')
date2 = pd.Timestamp('2024-12-31')
difference = date2 - date1
# Result: 365 days

Common DateTime Properties

Property	What It Gives You
`.dt.year`	The year (2024)
`.dt.month`	Month number (1-12)
`.dt.day`	Day of month (1-31)
`.dt.hour`	Hour (0-23)
`.dt.minute`	Minute (0-59)
`.dt.day_name()`	“Monday”, “Tuesday”…
`.dt.month_name()`	“January”, “February”…
`.dt.quarter`	Quarter (1-4)
`.dt.dayofweek`	0=Monday, 6=Sunday

👥 Groupby Operations

What Is Groupby?

Imagine you have a big box of colored marbles. Groupby helps you sort them into separate piles by color, then do something with each pile!

Simple Example:

You have sales data for different stores
Group by store name
Now you can see total sales per store!

graph TD
    A["🎨 All Marbles Mixed"] --> B["Groupby Color"]
    B --> C["🔴 Red Pile"]
    B --> D["🔵 Blue Pile"]
    B --> E["🟢 Green Pile"]
    C --> F["Count: 5"]
    D --> G["Count: 3"]
    E --> H["Count: 7"]

Basic Groupby

import pandas as pd

# Sales data
df = pd.DataFrame({
    'store': ['A', 'B', 'A', 'B', 'A'],
    'product': ['Apple', 'Apple', 'Banana',
                'Banana', 'Apple'],
    'sales': [100, 150, 80, 120, 90]
})

# Group by store
grouped = df.groupby('store')

# Total sales per store
grouped['sales'].sum()
# Store A: 270
# Store B: 270

Multiple Grouping

# Group by TWO things: store AND product
grouped = df.groupby(['store', 'product'])

# Sales for each store-product combo
grouped['sales'].sum()

# Store A, Apple: 190
# Store A, Banana: 80
# Store B, Apple: 150
# Store B, Banana: 120

Common Groupby Operations

Operation	What It Does
`.sum()`	Add all values together
`.mean()`	Find the average
`.count()`	Count how many items
`.min()`	Find smallest value
`.max()`	Find largest value
`.first()`	Get first item
`.last()`	Get last item

📊 Pandas Aggregation

What Is Aggregation?

Aggregation means combining many values into one summary number. Like counting all your toys instead of listing each one!

Simple Example:

You have 50 test scores
Aggregation gives you: Average = 85, Highest = 100, Lowest = 60
One line of summary instead of 50 numbers!

The `.agg()` Function

import pandas as pd

df = pd.DataFrame({
    'category': ['A', 'A', 'B', 'B'],
    'value': [10, 20, 30, 40],
    'count': [1, 2, 3, 4]
})

# Multiple aggregations at once!
result = df.groupby('category').agg({
    'value': ['sum', 'mean'],
    'count': ['min', 'max']
})

Result:

category	value_sum	value_mean	count_min	count_max
A	30	15.0	1	2
B	70	35.0	3	4

Named Aggregations (Cleaner Way!)

result = df.groupby('category').agg(
    total_value=('value', 'sum'),
    avg_value=('value', 'mean'),
    min_count=('count', 'min'),
    max_count=('count', 'max')
)

Custom Aggregation Functions

# Your own function!
def range_func(x):
    return x.max() - x.min()

df.groupby('category')['value'].agg(range_func)
# A: 10 (20-10)
# B: 10 (40-30)

Aggregation Without Groupby

# Summarize entire DataFrame
df['value'].agg(['sum', 'mean', 'min', 'max'])

# Result:
# sum     100
# mean     25
# min      10
# max      40

↔️ Wide vs Long Data

What’s the Difference?

Think of a spreadsheet. Wide data spreads information across columns. Long data stacks everything in rows.

Wide Data (like a calendar):

Name	Jan	Feb	Mar
Alice	100	110	120
Bob	200	210	220

Long Data (like a receipt):

Name	Month	Value
Alice	Jan	100
Alice	Feb	110
Alice	Mar	120
Bob	Jan	200
Bob	Feb	210
Bob	Mar	220

When to Use Which?

graph TD
    A["Your Data"] --> B{What do you need?}
    B -->|Easy to read| C["Wide Format"]
    B -->|Easy to analyze| D["Long Format"]
    C --> E["Reports, Dashboards"]
    D --> F["Groupby, Plotting, ML"]

Going from Wide to Long: `melt()`

import pandas as pd

# Wide data
wide_df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Jan': [100, 200],
    'Feb': [110, 210],
    'Mar': [120, 220]
})

# Melt to long format
long_df = pd.melt(
    wide_df,
    id_vars=['Name'],      # Keep this column
    var_name='Month',      # New column for headers
    value_name='Sales'     # New column for values
)

Result:

Name	Month	Sales
Alice	Jan	100
Alice	Feb	110
Alice	Mar	120
Bob	Jan	200
Bob	Feb	210
Bob	Mar	220

Going from Long to Wide: `pivot()`

# Long data back to wide
wide_again = long_df.pivot(
    index='Name',     # Row labels
    columns='Month',  # Column headers
    values='Sales'    # Values to fill in
)

`pivot_table()` for Aggregation

When you have duplicate entries, use pivot_table():

# Data with duplicates
sales = pd.DataFrame({
    'Store': ['A', 'A', 'B', 'B'],
    'Product': ['X', 'X', 'X', 'Y'],
    'Sales': [10, 20, 30, 40]
})

# Pivot with aggregation
result = pd.pivot_table(
    sales,
    index='Store',
    columns='Product',
    values='Sales',
    aggfunc='sum'  # Combine duplicates!
)

Result:

Store	X	Y
A	30	NaN
B	30	40

🎯 Quick Reference Summary

Operation	Purpose	Key Function
String	Work with text	`.str.method()`
DateTime	Work with dates	`.dt.property`
Groupby	Split into groups	`.groupby()`
Aggregation	Summarize data	`.agg()`
Reshape	Change layout	`melt()`, `pivot()`

🌟 You Did It!

You’ve just learned five powerful Pandas operations! Like a chef who now knows how to:

✂️ Chop text perfectly (String Operations)
⏰ Manage cooking times (DateTime)
🗂️ Sort ingredients by type (Groupby)
📏 Measure and count (Aggregation)
🔄 Rearrange your kitchen (Wide vs Long)

Now go cook up some amazing data insights! 🐼✨

Blimto

Pandas Operations

Unable to load concept

Coming Soon...

🐼 Pandas Operations: Your Data Kitchen Adventure!

🔤 String Operations in Pandas

What Are String Operations?

The Magic .str Accessor

Common String Tricks

Real Kitchen Example

Finding Things in Text

📅 DateTime Operations

What Are DateTime Operations?

Converting to DateTime

Extracting Parts of a Date

Date Math

Common DateTime Properties

👥 Groupby Operations

What Is Groupby?

Basic Groupby

Multiple Grouping

Common Groupby Operations

📊 Pandas Aggregation

What Is Aggregation?

The .agg() Function

Named Aggregations (Cleaner Way!)

Custom Aggregation Functions

Aggregation Without Groupby

↔️ Wide vs Long Data

What’s the Difference?

When to Use Which?

Going from Wide to Long: melt()

Going from Long to Wide: pivot()

pivot_table() for Aggregation

🎯 Quick Reference Summary

🌟 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactives - Premium Content

Interactives - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcards - Premium Content

Flashcards - Premium Content

Stay Tuned!

Sign in Required

Report an Issue

The Magic `.str` Accessor

The `.agg()` Function

Going from Wide to Long: `melt()`

Going from Long to Wide: `pivot()`

`pivot_table()` for Aggregation