Pandas

Loading concept...

๐Ÿผ Pandas: Your Dataโ€™s Best Friend

The Story of the Magic Spreadsheet

Imagine you have a giant box of LEGO bricks scattered all over your room. Finding the red ones? Nightmare! Sorting by size? Hours of work! Now imagine a magic helper that can instantly find, sort, combine, and organize ALL your bricks in seconds.

Thatโ€™s Pandas! ๐ŸŽ‰

Pandas is like having a super-smart assistant for your data. It takes messy information and makes it neat, organized, and easy to understand.


๐ŸŽฏ What Weโ€™ll Learn

graph TD A["๐Ÿผ Pandas Basics"] --> B["๐Ÿ“Š DataFrames"] B --> C["๐Ÿ”ง Data Manipulation"] C --> D["๐Ÿ“ฆ GroupBy Magic"] D --> E["๐Ÿ”— Merge & Join"] E --> F["๐Ÿ“… DateTime Handling"]

๐Ÿ“š Chapter 1: Meet Pandas

What is Pandas?

Think of Pandas as a super-powered spreadsheet that lives inside Python. Just like how Excel has rows and columns, Pandas has them tooโ€”but with superpowers!

Real Life Examples:

  • Netflix uses it to organize viewer data
  • Banks use it to track transactions
  • Scientists use it to analyze experiments

Your First Pandas Code

import pandas as pd

# Create a simple table
data = {
    'Name': ['Anna', 'Bob', 'Cara'],
    'Age': [10, 12, 11],
    'Score': [95, 88, 92]
}

df = pd.DataFrame(data)
print(df)

Output:

   Name  Age  Score
0  Anna   10     95
1   Bob   12     88
2  Cara   11     92

๐ŸŽˆ Think of it: Each row is a person, each column is information about them.


๐Ÿ“Š Chapter 2: The DataFrame - Your Data Table

Whatโ€™s a DataFrame?

A DataFrame is like a table in a notebook. It has:

  • Rows = Each item (like each student)
  • Columns = Information about items (like name, age, score)
  • Index = Row numbers (like seat numbers)

Creating DataFrames

Method 1: From a Dictionary

students = {
    'Name': ['Emma', 'Liam'],
    'Grade': ['A', 'B']
}
df = pd.DataFrame(students)

Method 2: From a List

data = [
    ['Emma', 'A'],
    ['Liam', 'B']
]
df = pd.DataFrame(data,
    columns=['Name', 'Grade'])

Selecting Data

# Get one column
df['Name']

# Get multiple columns
df[['Name', 'Grade']]

# Get one row by position
df.iloc[0]  # First row

# Get rows by condition
df[df['Grade'] == 'A']

๐Ÿง™โ€โ™‚๏ธ Magic Tip: iloc = position (like โ€œitem 0โ€), loc = label (like โ€œrow named Xโ€)


๐Ÿ”ง Chapter 3: Data Manipulation

The Art of Shaping Data

Imagine youโ€™re a chef preparing ingredients. Sometimes you need to:

  • Add new ingredients (new columns)
  • Remove bad parts (drop columns/rows)
  • Change how things look (transform data)
  • Filter out what you donโ€™t need

Adding New Columns

df['Bonus'] = df['Score'] * 0.1

# Or with a condition
df['Pass'] = df['Score'] >= 60

Removing Data

# Drop a column
df = df.drop('Bonus', axis=1)

# Drop a row
df = df.drop(0, axis=0)

# Drop rows with missing values
df = df.dropna()

Changing Values

# Replace values
df['Grade'] = df['Grade'].replace(
    'F', 'Fail')

# Apply a function
df['Name'] = df['Name'].str.upper()

Filtering Data

# Students who passed
passed = df[df['Score'] >= 60]

# Multiple conditions
stars = df[
    (df['Score'] >= 90) &
    (df['Age'] <= 12)
]

๐ŸŽฏ Remember: & means AND, | means OR


๐Ÿ“ฆ Chapter 4: GroupBy - Sorting into Buckets

The Bucket Story

Imagine you have a basket of mixed fruits and you want to:

  1. Group them by type (apples together, oranges together)
  2. Count how many of each
  3. Find the biggest one in each group

Thatโ€™s exactly what GroupBy does!

Basic GroupBy

# Sample data
sales = pd.DataFrame({
    'Store': ['A', 'B', 'A', 'B'],
    'Product': ['Apple', 'Apple',
                'Banana', 'Banana'],
    'Amount': [10, 15, 20, 25]
})

# Group by Store and sum
by_store = sales.groupby('Store')
print(by_store['Amount'].sum())

Output:

Store
A    30
B    40

Multiple Aggregations

# Get multiple stats at once
stats = sales.groupby('Store').agg({
    'Amount': ['sum', 'mean', 'count']
})

Group by Multiple Columns

detailed = sales.groupby(
    ['Store', 'Product']
)['Amount'].sum()

๐Ÿชฃ Think of it: GroupBy = Put similar things in buckets, then do math on each bucket!


๐Ÿ”— Chapter 5: Merge and Join

The Puzzle Piece Story

Imagine you have two puzzle pieces that belong together:

  • Piece 1: Student names and their IDs
  • Piece 2: IDs and their test scores

To see โ€œwhich student got which score,โ€ you need to connect the pieces using the ID!

Types of Joins

graph TD A["Two Tables"] --> B["Inner Join"] A --> C["Left Join"] A --> D["Right Join"] A --> E["Outer Join"] B --> F["Only matching rows"] C --> G["All left + matching right"] D --> H["All right + matching left"] E --> I["All rows from both"]

Merge Example

# Table 1: Students
students = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Anna', 'Bob', 'Cara']
})

# Table 2: Scores
scores = pd.DataFrame({
    'ID': [1, 2, 4],
    'Score': [95, 88, 75]
})

# Inner join (only matching IDs)
result = pd.merge(
    students, scores,
    on='ID', how='inner'
)

Result:

   ID  Name  Score
0   1  Anna     95
1   2   Bob     88

Different Join Types

# Left join - keep all students
left = pd.merge(
    students, scores,
    on='ID', how='left'
)

# Outer join - keep everyone
outer = pd.merge(
    students, scores,
    on='ID', how='outer'
)

๐Ÿงฉ Remember:

  • inner = Only matches
  • left = All from left table
  • right = All from right table
  • outer = Everything from both

๐Ÿ“… Chapter 6: DateTime Handling

Time is Data Too!

Dates and times are special. Theyโ€™re not just numbers or textโ€”they have meaning! Pandas understands this.

Examples of date questions:

  • โ€œHow many sales in January?โ€
  • โ€œWhat day had the most visitors?โ€
  • โ€œHow many hours between these events?โ€

Converting to DateTime

# Create dates from strings
df = pd.DataFrame({
    'date': ['2024-01-15',
             '2024-02-20',
             '2024-03-25']
})

# Convert to datetime
df['date'] = pd.to_datetime(df['date'])

Extracting Date Parts

# Get year, month, day
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day'] = df['date'].dt.day

# Get day of week (0=Monday)
df['weekday'] = df['date'].dt.dayofweek

# Get day name
df['day_name'] = df['date'].dt.day_name()

Date Math

# Add days
df['next_week'] = df['date'] + \
    pd.Timedelta(days=7)

# Find difference
df['diff'] = df['date'].diff()

# Resample by month
monthly = df.set_index('date').\
    resample('M').sum()

Filtering by Date

# Sales in 2024
sales_2024 = df[
    df['date'].dt.year == 2024
]

# Between two dates
jan_sales = df[
    (df['date'] >= '2024-01-01') &
    (df['date'] <= '2024-01-31')
]

๐Ÿ“† Pro Tip: Always convert date strings to datetime FIRST, then do operations!


๐ŸŽ‰ You Did It!

Youโ€™ve learned the six superpowers of Pandas:

Power What It Does
๐Ÿผ Pandas Basics Import and create data
๐Ÿ“Š DataFrames Organize in rows/columns
๐Ÿ”ง Manipulation Add, remove, change data
๐Ÿ“ฆ GroupBy Sort into buckets & summarize
๐Ÿ”— Merge/Join Connect two tables
๐Ÿ“… DateTime Work with dates & times

๐Ÿš€ Quick Reference

import pandas as pd

# Create DataFrame
df = pd.DataFrame(data)

# Select
df['column']      # One column
df[['a', 'b']]    # Multiple columns
df.iloc[0]        # Row by position
df.loc[0]         # Row by label

# Filter
df[df['col'] > 5]

# Group
df.groupby('col').sum()

# Merge
pd.merge(df1, df2, on='key')

# DateTime
pd.to_datetime(df['date'])
df['date'].dt.year

Remember: Data is like LEGO bricks. Pandas helps you build anything you imagine! ๐Ÿงฑโœจ

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.