Selection Methods

Loading concept...

🎯 Pandas Selection Methods: Finding Treasure in Your Data

Imagine your DataFrame is a giant toy box. Inside are rows (like shelves) and columns (like labeled bins). Selection methods are your special tools to reach in and grab exactly what you want!


🧸 The Toy Box Analogy

Think of a DataFrame like a giant toy organizer:

  • Columns = labeled bins (Name, Age, Score)
  • Rows = numbered shelves (0, 1, 2, 3…)
  • Each cell = one toy in a specific bin on a specific shelf

Your job? Learn all the ways to grab toys!


📦 Selecting a Single Column

The simplest grab: Pick ONE bin from the toy box.

import pandas as pd

# Our toy box
df = pd.DataFrame({
    'Name': ['Ana', 'Ben', 'Cat'],
    'Age': [10, 11, 9],
    'Score': [95, 88, 92]
})

# Grab the "Name" bin
names = df['Name']
print(names)

Output:

0    Ana
1    Ben
2    Cat
Name: Name, dtype: object

💡 Two ways to grab one column:

  • df['Name'] ← bracket notation (always works)
  • df.Name ← dot notation (only for simple names)

⚠️ Warning: Dot notation fails if column name has spaces or matches a method!


📦📦 Selecting Multiple Columns

Grab several bins at once using a list!

# Grab Name and Score bins
subset = df[['Name', 'Score']]
print(subset)

Output:

  Name  Score
0  Ana     95
1  Ben     88
2  Cat     92

🎯 The trick: Double brackets [[...]] = “give me a DataFrame with these columns”

graph TD A[df] --> B["df['Name']"] A --> C["df[['Name', 'Score']]"] B --> D[Series - one column] C --> E[DataFrame - multiple columns]

🔢 Row and Value Selection

Selecting rows is like picking shelves!

# Slice rows 0 to 1 (not including 2)
first_two = df[0:2]
print(first_two)

Output:

  Name  Age  Score
0  Ana   10     95
1  Ben   11     88

But wait—there’s a BETTER way…


🎯 loc vs iloc: The Twin Heroes

These are your power tools for precise selection!

🏷️ loc = Label-based (uses names)

# Get row with label 0, column 'Name'
df.loc[0, 'Name']  # Returns: 'Ana'

# Get multiple rows and columns
df.loc[0:1, ['Name', 'Age']]

🔢 iloc = Integer-based (uses positions)

# Get row at position 0, column at position 0
df.iloc[0, 0]  # Returns: 'Ana'

# Get first 2 rows, first 2 columns
df.iloc[0:2, 0:2]

🆚 The Big Difference

Feature loc iloc
Uses Labels/Names Positions/Numbers
Slicing Inclusive Exclusive (like Python)
df.loc[0:2] Rows 0, 1, AND 2
df.iloc[0:2] Rows 0 and 1 only
graph TD A[Need to select?] --> B{Know the label?} B -->|Yes| C[Use loc] B -->|No| D{Know position?} D -->|Yes| E[Use iloc] C --> F["df.loc[row_label, col_name]"] E --> G["df.iloc[row_pos, col_pos]"]

⚡ Scalar Access: at and iat

When you need just ONE single value—fast!

🏷️ at = Label-based (single value)

# Get one specific cell by labels
df.at[0, 'Name']  # Returns: 'Ana'

🔢 iat = Integer-based (single value)

# Get one specific cell by position
df.iat[0, 0]  # Returns: 'Ana'

🚀 Why use them? They’re faster than loc/iloc for single values!

Method When to Use
at Single value by label
iat Single value by position
loc Rows/columns by label
iloc Rows/columns by position

🏷️ filter() Method: Select by Label Patterns

Find columns or rows whose NAMES match a pattern!

# DataFrame with many columns
df2 = pd.DataFrame({
    'score_math': [90, 85],
    'score_eng': [88, 92],
    'name': ['Ana', 'Ben']
})

# Get columns containing "score"
df2.filter(like='score')

Output:

   score_math  score_eng
0          90         88
1          85         92

More filter tricks:

# Columns starting with 's'
df2.filter(regex='^s')

# Filter rows by label pattern
df2.filter(items=[0], axis=0)

🔍 query() Method: Filter with Words

Write conditions like you’re asking a question!

# Find kids older than 9
df.query('Age > 9')

Output:

  Name  Age  Score
0  Ana   10     95
1  Ben   11     88

More query magic:

# Multiple conditions
df.query('Age > 9 and Score >= 90')

# Using variables
min_age = 10
df.query('Age >= @min_age')

💡 Why query() rocks:

  • Reads like English!
  • Cleaner than bracket conditions
  • Use @ for external variables

🎭 where() Method: Keep or NaN

Keep values that match, turn others to NaN!

# Keep scores >= 90, others become NaN
df['Score'].where(df['Score'] >= 90)

Output:

0    95.0
1     NaN
2    92.0
Name: Score, dtype: float64

With replacement value:

# Replace non-matching with 0
df['Score'].where(df['Score'] >= 90, 0)

Output:

0    95
1     0
2    92

🎭 mask() Method: The Opposite of where()

Hide values that match, keep the rest!

# Hide scores >= 90 (make them NaN)
df['Score'].mask(df['Score'] >= 90)

Output:

0     NaN
1    88.0
2     NaN

🆚 where() vs mask()

Method Keeps Hides
where(condition) True values False → NaN
mask(condition) False values True → NaN

💡 Memory trick:

  • where = “WHERE this is true, keep it”
  • mask = “MASK (hide) where this is true”

✏️ Conditional Assignment

Change values based on conditions!

Method 1: loc with condition

# Give bonus: if Score >= 90, add 5
df.loc[df['Score'] >= 90, 'Score'] += 5
print(df)

Output:

  Name  Age  Score
0  Ana   10    100
1  Ben   11     88
2  Cat    9     97

Method 2: where for assignment

# Set low scores to 70
df['Score'] = df['Score'].where(
    df['Score'] >= 90, 70
)

Method 3: np.where for if-else

import numpy as np

# Pass/Fail based on score
df['Status'] = np.where(
    df['Score'] >= 90,
    'Pass',
    'Fail'
)
graph TD A[Conditional Assignment] --> B[Change specific cells] B --> C["df.loc[condition, col] = value"] A --> D[Replace non-matching] D --> E["df[col].where#40;cond, replacement#41;"] A --> F[If-else new column] F --> G["np.where#40;cond, if_true, if_false#41;"]

🏆 Quick Reference Summary

Task Method Example
One column df['col'] df['Name']
Multiple columns df[['a','b']] df[['Name','Age']]
By label loc df.loc[0, 'Name']
By position iloc df.iloc[0, 0]
Fast single value at/iat df.at[0, 'Name']
Label patterns filter() df.filter(like='score')
Text conditions query() df.query('Age > 10')
Keep matching where() df['Score'].where(cond)
Hide matching mask() df['Score'].mask(cond)
Change values loc df.loc[cond, 'col'] = val

🎉 You Did It!

You now have 10 powerful ways to select and filter data in Pandas:

  1. ✅ Single column selection
  2. ✅ Multiple column selection
  3. ✅ Row and value selection
  4. ✅ loc (label-based)
  5. ✅ iloc (position-based)
  6. ✅ at/iat (fast scalar access)
  7. ✅ filter() for label patterns
  8. ✅ query() for readable conditions
  9. ✅ where() to keep matches
  10. ✅ mask() to hide matches
  11. ✅ Conditional assignment

You’re now a data selection ninja! 🥷

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.