tidyr is a Tidyverse package that helps reshape data, split or combine columns, and handle missing values. It organizes messy data into tidy format.

What does pivot_longer do in tidyr?

pivot_longer takes wide data and makes it tall by converting column names into values. It creates more rows and fewer columns.

How do you handle missing values with tidyr?

tidyr offers drop_na to remove NA rows, fill to copy nearby values, replace_na to substitute defaults, and complete to show all combinations.

tidyr in R | Data Reshaping Guide

The Magical Drawer Organizer: Mastering tidyr

Your Data’s Best Friend

Imagine you have a messy toy box. Some toys are scattered everywhere, some are mixed together, and some are missing! tidyr is like a magical drawer organizer that helps you put everything in the right place.

🎯 What is tidyr?

tidyr is part of the Tidyverse family in R. It helps you:

Reshape your data (like folding clothes different ways)
Split information apart (like separating mixed crayons)
Join pieces together (like combining puzzle pieces)
Handle missing stuff (like finding lost toys)

🔄 PIVOT FUNCTIONS: The Shape-Shifters

The Bookshelf Analogy

Think of your data like books on a shelf. Sometimes you want them standing tall (long format), and sometimes you want them lying flat (wide format). Pivot functions help you change how your data “sits”!

pivot_longer(): Making Data Tall

What Does It Do?

pivot_longer() takes wide data and makes it tall. It’s like taking books lying flat and standing them up one by one.

When Do You Use It?

When you have column names that are actually values (like months: Jan, Feb, Mar as columns).

Simple Example

Before (Wide - books lying flat):

Student	Math	Science	Art
Emma	90	85	95
Liam	80	90	88

After (Long - books standing tall):

Student	Subject	Score
Emma	Math	90
Emma	Science	85
Emma	Art	95
Liam	Math	80
Liam	Science	90
Liam	Art	88

The Code

library(tidyr)

# Create wide data
grades <- tibble(
  Student = c("Emma", "Liam"),
  Math = c(90, 80),
  Science = c(85, 90),
  Art = c(95, 88)
)

# Make it tall!
tall_grades <- grades %>%
  pivot_longer(
    cols = c(Math, Science, Art),
    names_to = "Subject",
    values_to = "Score"
  )

Breaking It Down

Argument	What It Means	Like…
`cols`	Which columns to pivot	Which books to stand up
`names_to`	New column for old column names	Label for the book titles
`values_to`	New column for the values	Label for the book contents

pivot_wider(): Making Data Wide

What Does It Do?

pivot_wider() takes tall data and spreads it wide. It’s like taking standing books and laying them flat side by side.

When Do You Use It?

When you want each unique value to become its own column.

Simple Example

Before (Long):

City	Month	Temp
Tokyo	Jan	5
Tokyo	Feb	7
Paris	Jan	4
Paris	Feb	5

After (Wide):

City	Jan	Feb
Tokyo	5	7
Paris	4	5

The Code

# Create long data
temps <- tibble(
  City = c("Tokyo", "Tokyo",
           "Paris", "Paris"),
  Month = c("Jan", "Feb",
            "Jan", "Feb"),
  Temp = c(5, 7, 4, 5)
)

# Make it wide!
wide_temps <- temps %>%
  pivot_wider(
    names_from = Month,
    values_from = Temp
  )

Breaking It Down

Argument	What It Means	Like…
`names_from`	Column whose values become new column names	Book titles become shelf labels
`values_from`	Column whose values fill the new columns	Book contents fill the shelves

🔄 Quick Comparison

graph TD
    A["Wide Data"] -->|pivot_longer| B["Long Data"]
    B -->|pivot_wider| A
    style A fill:#ffd700,stroke:#333
    style B fill:#87ceeb,stroke:#333

Remember:

Longer = More rows, fewer columns
Wider = More columns, fewer rows

✂️ SEPARATE AND UNITE: The Splitters and Joiners

The Birthday Cake Analogy

Sometimes you have a whole cake (combined data), and you need to cut it into slices (separate). Other times, you have slices you want to put back together into a cake (unite)!

separate(): Cutting the Cake

What Does It Do?

separate() splits one column into multiple columns. Like cutting a cake into pieces!

Simple Example

Before:

id	name_age
1	Emma_12
2	Liam_10
3	Olivia_11

After:

id	name	age
1	Emma	12
2	Liam	10
3	Olivia	11

The Code

# Data with combined info
kids <- tibble(
  id = 1:3,
  name_age = c("Emma_12",
               "Liam_10",
               "Olivia_11")
)

# Separate into two columns
kids_split <- kids %>%
  separate(
    col = name_age,
    into = c("name", "age"),
    sep = "_"
  )

Key Arguments

Argument	What It Means
`col`	Column to split
`into`	Names for new columns
`sep`	What to split on (“_”, “-”, etc.)

separate_rows(): One-to-Many Split

What Does It Do?

When one cell has multiple values, separate_rows() creates multiple rows for each value!

Simple Example

Before:

student	hobbies
Emma	swim,read,dance
Liam	game,code

After:

student	hobbies
Emma	swim
Emma	read
Emma	dance
Liam	game
Liam	code

The Code

# Data with lists in cells
hobbies <- tibble(
  student = c("Emma", "Liam"),
  hobbies = c("swim,read,dance",
              "game,code")
)

# Separate into rows
hobbies_long <- hobbies %>%
  separate_rows(hobbies, sep = ",")

unite(): Joining the Pieces

What Does It Do?

unite() combines multiple columns into one column. The opposite of separate!

Simple Example

Before:

first	last	grade
Emma	Smith	A
Liam	Jones	B

After:

full_name	grade
Emma_Smith	A
Liam_Jones	B

The Code

# Data with separate columns
names <- tibble(
  first = c("Emma", "Liam"),
  last = c("Smith", "Jones"),
  grade = c("A", "B")
)

# Unite into one column
names_joined <- names %>%
  unite(
    col = "full_name",
    first, last,
    sep = "_"
  )

Key Arguments

Argument	What It Means
`col`	Name for new combined column
`...`	Columns to combine
`sep`	What to put between values

✂️ Separate vs Unite

graph LR
    A["One Column"] -->|separate| B["Multiple Columns"]
    B -->|unite| A
    style A fill:#ff9999,stroke:#333
    style B fill:#99ff99,stroke:#333

🔍 MISSING VALUE HANDLING: Finding the Lost Toys

The Treasure Hunt Analogy

Sometimes toys get lost under the bed (missing values). tidyr helps you:

Find the missing toys
Fill in the empty spots
Remove the unfixable gaps

Understanding NA

In R, NA means “Not Available” - a missing value.

# A vector with missing value
ages <- c(10, NA, 12, 8, NA)

It’s like a puzzle with missing pieces!

drop_na(): Removing Missing Values

What Does It Do?

drop_na() removes rows that have missing values. Like throwing away incomplete puzzle pieces.

Simple Example

Before:

name	age	score
Emma	10	95
Liam	NA	88
Olivia	11	NA
Noah	9	92

After (drop_na()):

name	age	score
Emma	10	95
Noah	9	92

The Code

# Data with missing values
students <- tibble(
  name = c("Emma", "Liam",
           "Olivia", "Noah"),
  age = c(10, NA, 11, 9),
  score = c(95, 88, NA, 92)
)

# Drop ALL rows with ANY NA
clean_data <- students %>%
  drop_na()

# Drop rows with NA only in 'age'
clean_age <- students %>%
  drop_na(age)

Be careful! drop_na() removes entire rows. Use wisely!

fill(): Filling the Gaps

What Does It Do?

fill() takes the previous (or next) value and copies it into empty spots. Like filling a coloring book by continuing the same color!

Simple Example

Before:

year	quarter	sales
2023	Q1	100
NA	Q2	120
NA	Q3	115
2024	Q1	130

After (fill year):

year	quarter	sales
2023	Q1	100
2023	Q2	120
2023	Q3	115
2024	Q1	130

The Code

# Data with gaps
sales <- tibble(
  year = c(2023, NA, NA, 2024),
  quarter = c("Q1", "Q2", "Q3", "Q1"),
  sales = c(100, 120, 115, 130)
)

# Fill down (default)
filled <- sales %>%
  fill(year)

# Fill up
filled_up <- sales %>%
  fill(year, .direction = "up")

Direction Options

Direction	What It Does
`"down"`	Copy from above (default)
`"up"`	Copy from below
`"downup"`	Down first, then up
`"updown"`	Up first, then down

replace_na(): Swap NA for Something Else

What Does It Do?

replace_na() replaces all NAs with a value you choose. Like putting a placeholder toy where the missing one was!

Simple Example

Before:

item	price
Apple	1.50
Banana	NA
Cherry	2.00

After (replace NA with 0):

item	price
Apple	1.50
Banana	0.00
Cherry	2.00

The Code

# Data with NA prices
fruits <- tibble(
  item = c("Apple", "Banana", "Cherry"),
  price = c(1.50, NA, 2.00)
)

# Replace NA with 0
fruits_fixed <- fruits %>%
  mutate(
    price = replace_na(price, 0)
  )

# Or replace multiple columns
fruits_fixed2 <- fruits %>%
  replace_na(list(price = 0))

complete(): Making Explicit Gaps

What Does It Do?

complete() makes sure all combinations exist. Missing ones get NA. Like making sure every puzzle piece slot exists, even if empty!

Simple Example

Before:

student	subject	score
Emma	Math	90
Emma	Art	95
Liam	Math	85

After (complete all combinations):

student	subject	score
Emma	Math	90
Emma	Art	95
Liam	Math	85
Liam	Art	NA

The Code

# Incomplete data
scores <- tibble(
  student = c("Emma", "Emma", "Liam"),
  subject = c("Math", "Art", "Math"),
  score = c(90, 95, 85)
)

# Complete all combinations
full_data <- scores %>%
  complete(student, subject)

# With a fill value
full_filled <- scores %>%
  complete(
    student, subject,
    fill = list(score = 0)
  )

🔍 Quick Reference

Function	What It Does	When to Use
`drop_na()`	Removes rows with NA	When you must have complete data
`fill()`	Copies previous/next value	When data should repeat
`replace_na()`	Swaps NA for a value	When you want a default
`complete()`	Creates missing combinations	When you need all possibilities

🎯 Missing Values Flow

graph TD
    A["Data with NAs"] --> B{What do you need?}
    B -->|Remove them| C["drop_na"]
    B -->|Copy nearby values| D["fill"]
    B -->|Use default value| E["replace_na"]
    B -->|Show all combinations| F["complete"]
    style A fill:#ffcccc,stroke:#333
    style C fill:#99ff99,stroke:#333
    style D fill:#99ff99,stroke:#333
    style E fill:#99ff99,stroke:#333
    style F fill:#99ff99,stroke:#333

🏆 PUTTING IT ALL TOGETHER

A Real-World Example

Let’s clean up messy survey data!

library(tidyr)
library(dplyr)

# Messy survey data
survey <- tibble(
  response_id = c(1, 2, 3),
  name_date = c("Emma_2023-01-15",
                "Liam_2023-01-16",
                "Olivia_2023-01-17"),
  Q1 = c(5, NA, 4),
  Q2 = c(4, 3, NA),
  Q3 = c(5, 4, 5)
)

# Clean it up!
clean_survey <- survey %>%
  # Separate name and date
  separate(name_date,
           c("name", "date"),
           sep = "_") %>%
  # Make questions long
  pivot_longer(
    cols = starts_with("Q"),
    names_to = "question",
    values_to = "rating"
  ) %>%
  # Fill missing with median
  mutate(
    rating = replace_na(rating, 4)
  )

🎉 You Did It!

You now know how to:

Reshape data with pivot_longer and pivot_wider
Split and join columns with separate and unite
Handle missing values with drop_na, fill, replace_na, and complete

Remember the analogies:

📚 Pivots = Books standing or lying on a shelf
🎂 Separate/Unite = Cutting and rejoining cake
🔍 Missing Values = Finding lost toys

Happy data cleaning! 🚀

tidyr

Unable to load concept

Coming Soon...

The Magical Drawer Organizer: Mastering tidyr

Your Data’s Best Friend

🎯 What is tidyr?

🔄 PIVOT FUNCTIONS: The Shape-Shifters

The Bookshelf Analogy

pivot_longer(): Making Data Tall

What Does It Do?

When Do You Use It?

Simple Example

The Code

Breaking It Down

pivot_wider(): Making Data Wide

What Does It Do?

When Do You Use It?

Simple Example

The Code

Breaking It Down

🔄 Quick Comparison

✂️ SEPARATE AND UNITE: The Splitters and Joiners

The Birthday Cake Analogy

separate(): Cutting the Cake

What Does It Do?

Simple Example

The Code

Key Arguments

separate_rows(): One-to-Many Split

What Does It Do?

Simple Example

The Code

unite(): Joining the Pieces

What Does It Do?

Simple Example

The Code

Key Arguments

✂️ Separate vs Unite

🔍 MISSING VALUE HANDLING: Finding the Lost Toys

The Treasure Hunt Analogy

Understanding NA

drop_na(): Removing Missing Values

What Does It Do?

Simple Example

The Code

fill(): Filling the Gaps

What Does It Do?

Simple Example

The Code

Direction Options

replace_na(): Swap NA for Something Else

What Does It Do?

Simple Example

The Code

complete(): Making Explicit Gaps

What Does It Do?

Simple Example

The Code

🔍 Quick Reference

🎯 Missing Values Flow

🏆 PUTTING IT ALL TOGETHER

A Real-World Example

🎉 You Did It!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue