What is a Pandas Series?

A Pandas Series is a single column of labeled data, like boxes on a shelf. Each box has a label (index) and contains one value.

What is a Pandas DataFrame?

A DataFrame is a table with rows and columns, like a mini Excel in Python. Each column is a Series, making it easy to organize data.

How do you handle missing data in Pandas?

Use dropna() to remove rows with missing values, or fillna() to replace them with a specific value, the mean, or the previous value.

Pandas Fundamentals | Data Analytics Guide

🐼 Pandas Fundamentals: Your Data Kitchen Adventure

Imagine you’re a chef in a magical kitchen. Instead of cooking food, you’re cooking DATA! Pandas is your super-powered cooking assistant that helps you organize, clean, and transform ingredients (data) into delicious meals (insights).

🎯 What You’ll Learn

Think of this journey like learning to cook in a restaurant kitchen:

Pandas Series → Single ingredient containers
Pandas DataFrame → Your recipe organizer
Index Concepts → Labels on your containers
Reading & Writing Data → Getting ingredients in and out
Data Selection & Filtering → Picking the right ingredients
Handling Missing Data → Dealing with empty containers
Data Type Conversion → Transforming ingredients

📦 Pandas Series: Your Single-Column Container

What is a Series?

Think of a Series like a single column of labeled boxes on a shelf. Each box has:

A label (index) on the outside
One thing inside (the value)

Real Life Example:

Your piggy bank slots labeled by month
Each slot has coins inside

import pandas as pd

# Create a Series - like labeling jars
fruits = pd.Series(
    [5, 3, 8, 2],
    index=['apples', 'bananas', 'oranges', 'grapes']
)

print(fruits)

Output:

apples     5
bananas    3
oranges    8
grapes     2
dtype: int64

Quick Operations on Series

# Get total fruits
total = fruits.sum()  # 18

# Find average
average = fruits.mean()  # 4.5

# Get specific fruit
apple_count = fruits['apples']  # 5

📊 Pandas DataFrame: Your Super Spreadsheet

What is a DataFrame?

A DataFrame is like a table with rows and columns. Think of it as:

A notebook with many columns
Each column is a Series
Like a mini Excel inside Python!

graph TD
    A["DataFrame"] --> B["Column 1&lt;br&gt;Series"]
    A --> C["Column 2&lt;br&gt;Series"]
    A --> D["Column 3&lt;br&gt;Series"]
    B --> E["Row 0"]
    B --> F["Row 1"]
    B --> G["Row 2"]

Creating a DataFrame

# Like making a class roster
students = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [10, 11, 10],
    'grade': ['A', 'B', 'A']
})

print(students)

Output:

      name  age grade
0    Alice   10     A
1      Bob   11     B
2  Charlie   10     A

DataFrame from a List

data = [
    ['Pizza', 10],
    ['Burger', 8],
    ['Salad', 5]
]

menu = pd.DataFrame(
    data,
    columns=['food', 'price']
)

🏷️ Pandas Index Concepts: Your Labeling System

What is an Index?

The index is like name tags on lockers. It helps you find things fast!

graph TD
    A["Index = Labels"] --> B["0: First Row"]
    A --> C["1: Second Row"]
    A --> D["2: Third Row"]
    E["Custom Index"] --> F["Alice: First Row"]
    E --> G["Bob: Second Row"]
    E --> H["Charlie: Third Row"]

Default vs Custom Index

# Default index: 0, 1, 2, 3...
df = pd.DataFrame({'score': [85, 90, 78]})
# Index: 0, 1, 2

# Custom index: meaningful labels
df_custom = pd.DataFrame(
    {'score': [85, 90, 78]},
    index=['Alice', 'Bob', 'Charlie']
)

Working with Index

# Set a column as index
df = df.set_index('name')

# Reset back to numbers
df = df.reset_index()

# Access by index label
df.loc['Alice']

# Access by position
df.iloc[0]

📂 Reading and Writing Data: Import & Export

Reading Data Files

Think of this as opening a recipe book and copying recipes into your kitchen.

graph TD
    A["External Files"] --> B["CSV Files"]
    A --> C["Excel Files"]
    A --> D["JSON Files"]
    B --> E["pd.read_csv"]
    C --> F["pd.read_excel"]
    D --> G["pd.read_json"]
    E --> H["DataFrame in Python"]
    F --> H
    G --> H

Reading CSV (Most Common!)

# Read a CSV file
df = pd.read_csv('students.csv')

# See first 5 rows
print(df.head())

# See last 3 rows
print(df.tail(3))

Reading Excel

# Read Excel file
df = pd.read_excel('data.xlsx')

# Read specific sheet
df = pd.read_excel(
    'data.xlsx',
    sheet_name='Sheet2'
)

Writing Data Out

# Save to CSV
df.to_csv('output.csv', index=False)

# Save to Excel
df.to_excel('output.xlsx', index=False)

# Save to JSON
df.to_json('output.json')

🎯 Data Selection and Filtering: Finding What You Need

Selecting Columns

# Single column (returns Series)
names = df['name']

# Multiple columns (returns DataFrame)
subset = df[['name', 'age']]

Selecting Rows

# By position (iloc)
first_row = df.iloc[0]
first_three = df.iloc[0:3]

# By label (loc)
alice_row = df.loc['Alice']

Filtering with Conditions

This is like asking questions to your data!

# Who is older than 10?
older = df[df['age'] > 10]

# Who got grade A?
grade_a = df[df['grade'] == 'A']

# Combine conditions with &
smart_young = df[
    (df['grade'] == 'A') &
    (df['age'] == 10)
]

graph TD
    A["All Data"] --> B{age > 10?}
    B -->|Yes| C["Keep Row"]
    B -->|No| D["Skip Row"]
    C --> E["Filtered Data"]

Quick Selection Examples

# First 5 rows
df.head()

# Last 5 rows
df.tail()

# Random sample of 3 rows
df.sample(3)

# Rows 10 to 20
df.iloc[10:21]

🕳️ Handling Missing Data: Dealing with Empty Boxes

What is Missing Data?

Missing data shows as NaN (Not a Number). It’s like:

Empty boxes in your storage
Blank cells in a spreadsheet
Information we don’t have yet

Finding Missing Data

# Check for missing values
print(df.isnull())

# Count missing in each column
print(df.isnull().sum())

# Check if any value is missing
print(df.isnull().any())

Dealing with Missing Data

# Option 1: Remove rows with missing
df_clean = df.dropna()

# Option 2: Fill with a value
df_filled = df.fillna(0)

# Option 3: Fill with average
df['age'] = df['age'].fillna(
    df['age'].mean()
)

# Option 4: Fill with previous value
df_ffill = df.fillna(method='ffill')

graph TD
    A["Missing Data?"] --> B{How to handle?}
    B --> C["dropna&lt;br&gt;Remove it"]
    B --> D["fillna&lt;br&gt;Replace it"]
    D --> E["With 0"]
    D --> F["With mean"]
    D --> G["With previous"]

🔄 Data Type Conversion: Transforming Your Data

Why Convert Types?

Sometimes data comes in wrong format:

Numbers stored as text “123”
Dates stored as text “2024-01-15”
Categories stored as text

Checking Data Types

# See all column types
print(df.dtypes)

# Common types:
# int64 = whole numbers
# float64 = decimal numbers
# object = text/mixed
# bool = True/False
# datetime64 = dates

Converting Types

# Text to number
df['price'] = df['price'].astype(int)

# Number to text
df['id'] = df['id'].astype(str)

# Text to datetime
df['date'] = pd.to_datetime(df['date'])

# Text to category (saves memory!)
df['color'] = df['color'].astype('category')

Handling Conversion Errors

# Safe conversion with errors='coerce'
# Bad values become NaN instead of error
df['age'] = pd.to_numeric(
    df['age'],
    errors='coerce'
)

graph TD
    A["Original Type"] --> B{Convert to?}
    B --> C["int/float&lt;br&gt;astype or to_numeric"]
    B --> D["string&lt;br&gt;astype str"]
    B --> E["datetime&lt;br&gt;pd.to_datetime"]
    B --> F["category&lt;br&gt;astype category"]

🎉 Quick Reference Card

Task	Code
Create Series	`pd.Series([1,2,3])`
Create DataFrame	`pd.DataFrame({'a':[1,2]})`
Read CSV	`pd.read_csv('file.csv')`
Write CSV	`df.to_csv('out.csv')`
Select column	`df['column']`
Filter rows	`df[df['age'] > 18]`
Check missing	`df.isnull().sum()`
Fill missing	`df.fillna(0)`
Convert type	`df['col'].astype(int)`

🚀 You Did It!

You now understand the 7 fundamental pillars of Pandas:

✅ Series - Single columns of data
✅ DataFrame - Tables with rows and columns
✅ Index - Labels for fast lookup
✅ Read/Write - Getting data in and out
✅ Selection/Filtering - Finding specific data
✅ Missing Data - Handling empty values
✅ Type Conversion - Transforming data types

Next step: Practice with real data! Try loading a CSV file and explore it using what you learned.

Remember: Every data scientist started exactly where you are now. Keep practicing, keep exploring, and you’ll master Pandas in no time! 🐼✨

Blimto

Pandas Fundamentals

Unable to load concept

Coming Soon...

🐼 Pandas Fundamentals: Your Data Kitchen Adventure

🎯 What You’ll Learn

📦 Pandas Series: Your Single-Column Container

What is a Series?

Quick Operations on Series

📊 Pandas DataFrame: Your Super Spreadsheet

What is a DataFrame?

Creating a DataFrame

DataFrame from a List

🏷️ Pandas Index Concepts: Your Labeling System

What is an Index?

Default vs Custom Index

Working with Index

📂 Reading and Writing Data: Import & Export

Reading Data Files

Reading CSV (Most Common!)

Reading Excel

Writing Data Out

🎯 Data Selection and Filtering: Finding What You Need

Selecting Columns

Selecting Rows

Filtering with Conditions

Quick Selection Examples

🕳️ Handling Missing Data: Dealing with Empty Boxes

What is Missing Data?

Finding Missing Data

Dealing with Missing Data

🔄 Data Type Conversion: Transforming Your Data

Why Convert Types?

Checking Data Types

Converting Types

Handling Conversion Errors

🎉 Quick Reference Card

🚀 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactives - Premium Content

Interactives - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcards - Premium Content

Flashcards - Premium Content

Stay Tuned!

Sign in Required

Report an Issue