🧰 The Tidyverse Toolbox: Your Swiss Army Knife for Data!
Imagine you have a magic toolbox. Each tool inside helps you do one special thing with your data. Today, we’re going to open this toolbox and learn about seven amazing tools!
Think of data like LEGO blocks. Sometimes you need to:
- Cut and shape text (like cutting paper) → stringr
- Do the same thing to many blocks at once → purrr map
- Combine all blocks into one → purrr reduce
- Read dates (like reading a calendar) → lubridate parsing
- Pull out pieces of dates → lubridate extraction
- Measure time between dates → lubridate duration
- Read files into R → readr
Let’s explore each tool!
🧵 stringr: The Text Tailor
What is it? stringr helps you work with words and sentences—just like a tailor works with fabric!
The Magic Scissors: str_sub()
Want to cut out part of a word? Use str_sub()!
library(stringr)
word <- "RAINBOW"
str_sub(word, 1, 4)
# "RAIN" — first 4 letters!
str_sub(word, -3, -1)
# "BOW" — last 3 letters!
Think of it like cutting a piece of ribbon. You tell R where to start and where to stop!
The Word Detector: str_detect()
Does your word contain something? Ask str_detect()!
str_detect("I love pizza", "pizza")
# TRUE — yes, pizza is there!
str_detect("I love pizza", "taco")
# FALSE — no tacos here!
Find and Replace: str_replace()
Found something? Want to swap it? Easy!
str_replace("I like cats", "cats", "dogs")
# "I like dogs"
Split It Up: str_split()
Break a sentence into pieces!
str_split("apple-banana-cherry", "-")
# "apple" "banana" "cherry"
Like cutting a string of beads!
Other Handy stringr Tools
| Function | What It Does | Example |
|---|---|---|
str_length() |
Counts characters | str_length("hello") → 5 |
str_to_upper() |
MAKES ALL CAPS | str_to_upper("hi") → “HI” |
str_to_lower() |
makes all small | str_to_lower("HI") → “hi” |
str_trim() |
Removes extra spaces | str_trim(" hi ") → “hi” |
str_c() |
Glues strings together | str_c("a","b") → “ab” |
🗺️ purrr Map: The Copy Machine
What is it? Imagine you need to put a stamp on 100 letters. Would you do it one by one? No! You’d use a machine!
map() is that machine. It does the same thing to every item in a list!
Basic map()
library(purrr)
numbers <- list(1, 2, 3, 4)
# Add 10 to each number
map(numbers, ~ .x + 10)
# 11, 12, 13, 14
The ~ means “do this”, and .x is each item!
map() Variants: Choose Your Output
| Function | Returns | Example |
|---|---|---|
map() |
List | map(1:3, ~ .x * 2) → list(2,4,6) |
map_dbl() |
Numbers | map_dbl(1:3, ~ .x * 2) → 2 4 6 |
map_chr() |
Text | map_chr(1:3, ~ paste0("item", .x)) |
map_lgl() |
TRUE/FALSE | map_lgl(1:3, ~ .x > 2) |
map_int() |
Integers | map_int(1:3, ~ as.integer(.x * 2)) |
map2(): Two Lists at Once!
What if you have two lists and want to combine them?
names <- list("Ana", "Bob", "Cat")
ages <- list(5, 6, 7)
map2(names, ages, ~ paste(.x, "is", .y))
# "Ana is 5" "Bob is 6" "Cat is 7"
Like a zipper joining two sides!
pmap(): Many Lists Together
Have 3+ lists? Use pmap()!
first <- list("A", "B")
middle <- list("X", "Y")
last <- list("1", "2")
pmap(list(first, middle, last),
~ paste(..1, ..2, ..3))
# "A X 1" "B Y 2"
🎯 purrr Reduce: The Combiner
What is it? Imagine you have a pile of cards. You pick up two, combine them, then pick up the next, combine again… until you have ONE final card!
That’s reduce()!
Simple Example
numbers <- c(1, 2, 3, 4)
reduce(numbers, `+`)
# 10 (which is 1+2+3+4)
reduce(numbers, `*`)
# 24 (which is 1*2*3*4)
reduce() with Custom Function
words <- c("I", "love", "R")
reduce(words, ~ paste(.x, .y))
# "I love R"
Step by step:
- Start with “I”
- Combine with “love” → “I love”
- Combine with “R” → “I love R”
accumulate(): See Every Step
Want to see the journey, not just the destination?
accumulate(1:4, `+`)
# 1 3 6 10
Shows: 1, then 1+2=3, then 3+3=6, then 6+4=10!
📅 lubridate: The Calendar Wizard
Date Parsing: Reading Dates
Computers are picky about dates. lubridate helps them understand!
The Magic Rule: Use the function that matches your date format!
library(lubridate)
# Year-Month-Day format
ymd("2024-03-15")
# 2024-03-15
# Month-Day-Year format
mdy("03-15-2024")
# 2024-03-15
# Day-Month-Year format
dmy("15-03-2024")
# 2024-03-15
All give the same result! The function name tells R how to read it!
With Times Too!
ymd_hms("2024-03-15 14:30:00")
# 2024-03-15 14:30:00 UTC
mdy_hm("03-15-2024 2:30 PM")
# Works too!
| Function | Format | Example Input |
|---|---|---|
ymd() |
Year-Month-Day | “2024-03-15” |
mdy() |
Month-Day-Year | “03-15-2024” |
dmy() |
Day-Month-Year | “15-03-2024” |
ymd_hms() |
With time | “2024-03-15 14:30:00” |
🔍 Component Extraction: Taking Dates Apart
Once you have a date, you can pull out pieces like LEGO!
my_date <- ymd("2024-07-04")
year(my_date) # 2024
month(my_date) # 7
day(my_date) # 4
wday(my_date) # 5 (Thursday)
All the Pieces You Can Extract
| Function | Extracts | Example |
|---|---|---|
year() |
Year | 2024 |
month() |
Month number | 7 |
day() |
Day of month | 4 |
wday() |
Day of week | 5 (1=Sunday) |
hour() |
Hour | 14 |
minute() |
Minute | 30 |
second() |
Second | 45 |
yday() |
Day of year | 186 |
Get Names Instead of Numbers
month(my_date, label = TRUE)
# "Jul"
wday(my_date, label = TRUE)
# "Thu"
⏱️ Duration and Period: Measuring Time
What’s the difference?
- Duration: Exact seconds (like a stopwatch)
- Period: Calendar units (like a calendar)
Durations: Exact Time
dseconds(60) # 60 seconds
dminutes(5) # 300 seconds (5 × 60)
dhours(2) # 7200 seconds
ddays(1) # 86400 seconds
dweeks(1) # 604800 seconds
Periods: Calendar Time
seconds(60) # 1 minute in calendar
minutes(5) # 5 minutes
hours(2) # 2 hours
days(1) # 1 day
weeks(1) # 1 week
months(1) # 1 month
years(1) # 1 year
Why Does It Matter?
Imagine it’s January 31st. Add 1 month:
jan31 <- ymd("2024-01-31")
jan31 + months(1) # Period
# 2024-03-02 (Feb 31 doesn't exist!)
jan31 + ddays(30) # Duration
# 2024-03-01 (exactly 30 days)
Calculate Time Differences
start <- ymd("2024-01-01")
end <- ymd("2024-12-31")
end - start
# 365 days
interval(start, end) / months(1)
# 12 (twelve months)
📖 readr: The File Reader
What is it? readr helps you bring data files INTO R—like opening a book to read it!
read_csv(): Comma-Separated Files
library(readr)
data <- read_csv("my_data.csv")
That’s it! readr figures out the rest!
All the Readers
| Function | File Type | Separator |
|---|---|---|
read_csv() |
CSV | Comma (,) |
read_csv2() |
European CSV | Semicolon (;) |
read_tsv() |
TSV | Tab |
read_delim() |
Any | You choose! |
read_delim(): For Special Files
# Pipe-separated file
read_delim("data.txt", delim = "|")
# Colon-separated file
read_delim("data.txt", delim = ":")
Helpful Options
read_csv("data.csv",
skip = 2, # Skip first 2 rows
n_max = 100, # Read only 100 rows
na = c("", "NA", "?") # Treat these as missing
)
Writing Files Too!
write_csv(my_data, "output.csv")
write_tsv(my_data, "output.tsv")
🌟 Quick Reference Flow
graph TD A["Your Data"] --> B{What do you need?} B --> C["Work with TEXT"] B --> D["Apply function to MANY items"] B --> E["COMBINE items into one"] B --> F["Work with DATES"] B --> G["READ files"] C --> C1["stringr functions"] D --> D1["purrr map functions"] E --> E1["purrr reduce"] F --> F1{Which date task?} G --> G1["readr functions"] F1 --> F2["Parse: ymd, mdy, dmy"] F1 --> F3["Extract: year, month, day"] F1 --> F4["Duration: ddays, dmonths"]
💡 Remember!
- stringr = Text tools (str_*)
- map() = Do same thing to many items
- reduce() = Combine many into one
- lubridate parsing = Turn text into dates
- lubridate extraction = Pull parts from dates
- duration/period = Measure time differences
- readr = Read files into R
You now have the complete Tidyverse toolbox! Each tool has its purpose. Pick the right one, and data magic happens! 🪄
