🎯 Pandas Selection Methods: Finding Treasure in Your Data
Imagine your DataFrame is a giant toy box. Inside are rows (like shelves) and columns (like labeled bins). Selection methods are your special tools to reach in and grab exactly what you want!
🧸 The Toy Box Analogy
Think of a DataFrame like a giant toy organizer:
- Columns = labeled bins (Name, Age, Score)
- Rows = numbered shelves (0, 1, 2, 3…)
- Each cell = one toy in a specific bin on a specific shelf
Your job? Learn all the ways to grab toys!
📦 Selecting a Single Column
The simplest grab: Pick ONE bin from the toy box.
import pandas as pd
# Our toy box
df = pd.DataFrame({
'Name': ['Ana', 'Ben', 'Cat'],
'Age': [10, 11, 9],
'Score': [95, 88, 92]
})
# Grab the "Name" bin
names = df['Name']
print(names)
Output:
0 Ana
1 Ben
2 Cat
Name: Name, dtype: object
💡 Two ways to grab one column:
df['Name']← bracket notation (always works)df.Name← dot notation (only for simple names)
⚠️ Warning: Dot notation fails if column name has spaces or matches a method!
📦📦 Selecting Multiple Columns
Grab several bins at once using a list!
# Grab Name and Score bins
subset = df[['Name', 'Score']]
print(subset)
Output:
Name Score
0 Ana 95
1 Ben 88
2 Cat 92
🎯 The trick: Double brackets [[...]] = “give me a DataFrame with these columns”
graph TD A[df] --> B["df['Name']"] A --> C["df[['Name', 'Score']]"] B --> D[Series - one column] C --> E[DataFrame - multiple columns]
🔢 Row and Value Selection
Selecting rows is like picking shelves!
# Slice rows 0 to 1 (not including 2)
first_two = df[0:2]
print(first_two)
Output:
Name Age Score
0 Ana 10 95
1 Ben 11 88
But wait—there’s a BETTER way…
🎯 loc vs iloc: The Twin Heroes
These are your power tools for precise selection!
🏷️ loc = Label-based (uses names)
# Get row with label 0, column 'Name'
df.loc[0, 'Name'] # Returns: 'Ana'
# Get multiple rows and columns
df.loc[0:1, ['Name', 'Age']]
🔢 iloc = Integer-based (uses positions)
# Get row at position 0, column at position 0
df.iloc[0, 0] # Returns: 'Ana'
# Get first 2 rows, first 2 columns
df.iloc[0:2, 0:2]
🆚 The Big Difference
| Feature | loc | iloc |
|---|---|---|
| Uses | Labels/Names | Positions/Numbers |
| Slicing | Inclusive | Exclusive (like Python) |
df.loc[0:2] |
Rows 0, 1, AND 2 | — |
df.iloc[0:2] |
— | Rows 0 and 1 only |
graph TD A[Need to select?] --> B{Know the label?} B -->|Yes| C[Use loc] B -->|No| D{Know position?} D -->|Yes| E[Use iloc] C --> F["df.loc[row_label, col_name]"] E --> G["df.iloc[row_pos, col_pos]"]
⚡ Scalar Access: at and iat
When you need just ONE single value—fast!
🏷️ at = Label-based (single value)
# Get one specific cell by labels
df.at[0, 'Name'] # Returns: 'Ana'
🔢 iat = Integer-based (single value)
# Get one specific cell by position
df.iat[0, 0] # Returns: 'Ana'
🚀 Why use them? They’re faster than loc/iloc for single values!
| Method | When to Use |
|---|---|
at |
Single value by label |
iat |
Single value by position |
loc |
Rows/columns by label |
iloc |
Rows/columns by position |
🏷️ filter() Method: Select by Label Patterns
Find columns or rows whose NAMES match a pattern!
# DataFrame with many columns
df2 = pd.DataFrame({
'score_math': [90, 85],
'score_eng': [88, 92],
'name': ['Ana', 'Ben']
})
# Get columns containing "score"
df2.filter(like='score')
Output:
score_math score_eng
0 90 88
1 85 92
More filter tricks:
# Columns starting with 's'
df2.filter(regex='^s')
# Filter rows by label pattern
df2.filter(items=[0], axis=0)
🔍 query() Method: Filter with Words
Write conditions like you’re asking a question!
# Find kids older than 9
df.query('Age > 9')
Output:
Name Age Score
0 Ana 10 95
1 Ben 11 88
More query magic:
# Multiple conditions
df.query('Age > 9 and Score >= 90')
# Using variables
min_age = 10
df.query('Age >= @min_age')
💡 Why query() rocks:
- Reads like English!
- Cleaner than bracket conditions
- Use
@for external variables
🎭 where() Method: Keep or NaN
Keep values that match, turn others to NaN!
# Keep scores >= 90, others become NaN
df['Score'].where(df['Score'] >= 90)
Output:
0 95.0
1 NaN
2 92.0
Name: Score, dtype: float64
With replacement value:
# Replace non-matching with 0
df['Score'].where(df['Score'] >= 90, 0)
Output:
0 95
1 0
2 92
🎭 mask() Method: The Opposite of where()
Hide values that match, keep the rest!
# Hide scores >= 90 (make them NaN)
df['Score'].mask(df['Score'] >= 90)
Output:
0 NaN
1 88.0
2 NaN
🆚 where() vs mask()
| Method | Keeps | Hides |
|---|---|---|
where(condition) |
True values | False → NaN |
mask(condition) |
False values | True → NaN |
💡 Memory trick:
- where = “WHERE this is true, keep it”
- mask = “MASK (hide) where this is true”
✏️ Conditional Assignment
Change values based on conditions!
Method 1: loc with condition
# Give bonus: if Score >= 90, add 5
df.loc[df['Score'] >= 90, 'Score'] += 5
print(df)
Output:
Name Age Score
0 Ana 10 100
1 Ben 11 88
2 Cat 9 97
Method 2: where for assignment
# Set low scores to 70
df['Score'] = df['Score'].where(
df['Score'] >= 90, 70
)
Method 3: np.where for if-else
import numpy as np
# Pass/Fail based on score
df['Status'] = np.where(
df['Score'] >= 90,
'Pass',
'Fail'
)
graph TD A[Conditional Assignment] --> B[Change specific cells] B --> C["df.loc[condition, col] = value"] A --> D[Replace non-matching] D --> E["df[col].where#40;cond, replacement#41;"] A --> F[If-else new column] F --> G["np.where#40;cond, if_true, if_false#41;"]
🏆 Quick Reference Summary
| Task | Method | Example |
|---|---|---|
| One column | df['col'] |
df['Name'] |
| Multiple columns | df[['a','b']] |
df[['Name','Age']] |
| By label | loc |
df.loc[0, 'Name'] |
| By position | iloc |
df.iloc[0, 0] |
| Fast single value | at/iat |
df.at[0, 'Name'] |
| Label patterns | filter() |
df.filter(like='score') |
| Text conditions | query() |
df.query('Age > 10') |
| Keep matching | where() |
df['Score'].where(cond) |
| Hide matching | mask() |
df['Score'].mask(cond) |
| Change values | loc |
df.loc[cond, 'col'] = val |
🎉 You Did It!
You now have 10 powerful ways to select and filter data in Pandas:
- ✅ Single column selection
- ✅ Multiple column selection
- ✅ Row and value selection
- ✅ loc (label-based)
- ✅ iloc (position-based)
- ✅ at/iat (fast scalar access)
- ✅ filter() for label patterns
- ✅ query() for readable conditions
- ✅ where() to keep matches
- ✅ mask() to hide matches
- ✅ Conditional assignment
You’re now a data selection ninja! 🥷