Data Processing

Back

Loading concept...

๐Ÿณ R Data Processing: Your Kitchen Adventure!

Imagine youโ€™re a master chef in a magical kitchen. Your ingredients? Data! Your cooking tools? R functions! Letโ€™s learn how to slice, dice, mix, and serve beautiful data dishes.


๐Ÿฅ„ The Kitchen Metaphor

Think of your data like a big basket of ingredients. Sometimes you need to:

  • Pick only the tomatoes (Subset)
  • Chop them into pieces (Transform)
  • Work inside the basket easily (with/within)
  • Mix two baskets together (Merge)
  • Find whatโ€™s common or different (Set Operations)
  • Count how many of each (Contingency Tables)
  • Rearrange your table nicely (Table Manipulation)

Letโ€™s cook! ๐Ÿฝ๏ธ


1๏ธโƒฃ Subset Function: Picking Your Ingredients

What is it?

subset() helps you pick only the data you want โ€” like reaching into a fruit basket and grabbing only the apples!

The Magic Words

subset(data, condition)
subset(data, condition, select = columns)

๐ŸŽ Simple Example

# Our fruit basket
fruits <- data.frame(
  name = c("apple", "banana", "cherry"),
  color = c("red", "yellow", "red"),
  price = c(2, 1, 3)
)

# Pick only red fruits
red_fruits <- subset(fruits,
                     color == "red")

# Result: apple and cherry!

๐ŸŽฏ Pro Tips

  • Use select to pick specific columns:
# Get names of cheap fruits
subset(fruits,
       price < 2,
       select = name)

Why Kids Love It ๐Ÿง’

Itโ€™s like having a magic wand that says โ€œGive me only the toys that are blue!โ€ and poof โ€” you get exactly what you asked for!


2๏ธโƒฃ Transform Function: Cooking Your Data

What is it?

transform() lets you add new columns or change existing ones โ€” like adding seasoning to your dish!

The Magic Words

transform(data, new_column = calculation)

๐Ÿงฎ Simple Example

# Our students
students <- data.frame(
  name = c("Amy", "Bob"),
  math = c(80, 90),
  science = c(70, 85)
)

# Add total score
students <- transform(students,
  total = math + science,
  average = (math + science) / 2
)

๐ŸŽจ What Happens?

  name  math  science  total  average
1  Amy    80       70    150     75.0
2  Bob    90       85    175     87.5

Why Kids Love It ๐Ÿง’

Itโ€™s like putting stickers on your notebook โ€” youโ€™re adding new stuff without throwing anything away!


3๏ธโƒฃ With and Within Functions: Working Inside the Box

The Problem

Typing data$column everywhere is tiring! Like saying โ€œthe red boxโ€™s apple, the red boxโ€™s banana, the red boxโ€™s cherryโ€ฆโ€

The Solution: with() and within()

graph TD A["Your Data Frame"] --> B{What do you want?} B -->|Just calculate something| C["with"] B -->|Change the data| D["within"] C --> E["Returns a result"] D --> F["Returns modified data"]

๐ŸŽช with() Example

# Calculate without $ signs
with(students, {
  total <- math + science
  print(mean(total))
})
# Just shows the answer!

๐Ÿ”ง within() Example

# Modify the data itself
students <- within(students, {
  grade <- ifelse(average >= 80,
                  "A", "B")
})
# Now students HAS a grade column!

The Difference ๐Ÿค”

Function Does What? Returns
with() Calculates Just the answer
within() Modifies Changed data frame

4๏ธโƒฃ Merge Function: Mixing Two Baskets

What is it?

merge() is like having two puzzle pieces that snap together! It combines data from different tables.

The Magic Words

merge(table1, table2, by = "matching_column")

๐Ÿงฉ Simple Example

# Student names
names_df <- data.frame(
  id = c(1, 2, 3),
  name = c("Amy", "Bob", "Cat")
)

# Student scores
scores_df <- data.frame(
  id = c(1, 2, 3),
  score = c(95, 87, 92)
)

# Snap them together!
complete <- merge(names_df,
                  scores_df,
                  by = "id")

๐ŸŽจ Result

  id name score
1  1  Amy    95
2  2  Bob    87
3  3  Cat    92

๐Ÿ”ฎ Different Types of Merge

graph TD A["Merge Types"] --> B["Inner: Only matching"] A --> C["Left: Keep all left"] A --> D["Right: Keep all right"] A --> E["Full: Keep everything"]
# Keep everyone from left table
merge(x, y, by="id", all.x=TRUE)

# Keep everyone from both
merge(x, y, by="id", all=TRUE)

5๏ธโƒฃ Set Operations: Finding Friends & Strangers

What is it?

Like comparing two groups of friends:

  • Whoโ€™s in BOTH groups? (intersect)
  • Whoโ€™s in EITHER group? (union)
  • Whoโ€™s ONLY in group A? (setdiff)

๐ŸŽญ Simple Examples

group_a <- c("Amy", "Bob", "Cat")
group_b <- c("Bob", "Cat", "Dan")

# Friends in BOTH
intersect(group_a, group_b)
# "Bob" "Cat"

# ALL friends combined
union(group_a, group_b)
# "Amy" "Bob" "Cat" "Dan"

# Only in group A
setdiff(group_a, group_b)
# "Amy"

# Only in group B
setdiff(group_b, group_a)
# "Dan"

๐ŸŽช Visual Summary

Group A: ๐Ÿ”ด Amy | ๐ŸŸก Bob | ๐ŸŸข Cat
Group B:        | ๐ŸŸก Bob | ๐ŸŸข Cat | ๐Ÿ”ต Dan

intersect: ๐ŸŸก๐ŸŸข (Bob, Cat)
union:     ๐Ÿ”ด๐ŸŸก๐ŸŸข๐Ÿ”ต (All four)
setdiff A-B: ๐Ÿ”ด (Just Amy)
setdiff B-A: ๐Ÿ”ต (Just Dan)

6๏ธโƒฃ Contingency Tables: Counting Your Stickers

What is it?

table() counts how many of each thing you have โ€” like organizing your sticker collection by color and shape!

๐ŸŽจ Simple Example

# Our pets
pets <- data.frame(
  animal = c("cat", "dog", "cat",
             "dog", "cat"),
  color = c("white", "brown", "brown",
            "white", "white")
)

# Count by animal type
table(pets$animal)
# cat: 3, dog: 2

# Two-way table
table(pets$animal, pets$color)

๐ŸŽฏ Two-Way Result

     brown white
cat     1     2
dog     1     1

๐Ÿ”ง Add Margins (Totals)

my_table <- table(pets$animal,
                  pets$color)
addmargins(my_table)
      brown white Sum
cat      1     2   3
dog      1     1   2
Sum      2     3   5

7๏ธโƒฃ Table Manipulation: Arranging Your Display

Key Functions

Function What It Does
prop.table() Show percentages
margin.table() Get row/column totals
addmargins() Add sum rows/columns
ftable() Flatten multi-way tables

๐Ÿ“Š Proportion Tables

my_table <- table(pets$animal,
                  pets$color)

# Overall percentages
prop.table(my_table)
# Each cell / total

# Row percentages
prop.table(my_table, 1)
# Each row adds to 1

# Column percentages
prop.table(my_table, 2)
# Each column adds to 1

๐ŸŽช Row Percentages Example

       brown white
cat    0.33  0.67   (1 brown, 2 white)
dog    0.50  0.50   (1 brown, 1 white)

๐ŸงŠ Flatten Complex Tables

# 3-way table
t3 <- table(survey$gender,
            survey$age,
            survey$vote)

# Make it readable
ftable(t3)

๐ŸŽ“ The Complete Recipe

graph TD A["๐Ÿ“ฆ Raw Data"] --> B["๐Ÿ” subset"] B --> C["๐Ÿ”ง transform"] C --> D["๐Ÿ“ with/within"] D --> E{Need to combine?} E -->|Yes| F["๐Ÿ”— merge"] E -->|No| G["Compare sets?"] G -->|Yes| H["โš™๏ธ Set Operations"] G -->|No| I["Count things?"] F --> I H --> I I -->|Yes| J["๐Ÿ“Š table"] J --> K["๐ŸŽจ Table Manipulation"] K --> L["โœจ Beautiful Results!"]

๐Ÿš€ Quick Reference Card

Task Function Example
Filter rows subset() subset(df, x > 5)
Add columns transform() transform(df, y=x*2)
Work inside with() with(df, mean(x))
Modify inside within() within(df, y<-x*2)
Join tables merge() merge(a, b, by="id")
Common items intersect() intersect(v1, v2)
All items union() union(v1, v2)
Difference setdiff() setdiff(v1, v2)
Count table() table(df$col)
Percentages prop.table() prop.table(tbl)

๐ŸŽ‰ You Did It!

Youโ€™ve just learned how to:

  • โœ… Pick exactly what you need (subset)
  • โœ… Transform and enhance data (transform)
  • โœ… Work efficiently inside data (with/within)
  • โœ… Combine data sources (merge)
  • โœ… Compare groups (set operations)
  • โœ… Count and summarize (contingency tables)
  • โœ… Display beautifully (table manipulation)

Youโ€™re now a Data Processing Chef! ๐Ÿ‘จโ€๐Ÿณ

โ€œData processing in R is like cooking โ€” once you know your tools, you can create anything!โ€

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.