External Data

Back

Loading concept...

🌐 Tidyverse Extended: Bringing Outside Data Into R

The Story of the Data Explorer

Imagine you’re an explorer with a magical backpack. Inside R, you have amazing tools to sort, clean, and understand treasures. But what if your treasures are locked in different boxes outside—Excel files, websites, databases? Today, you’ll learn to open any box and bring data home!


🗂️ Excel File Operations

What Are Excel Files?

Excel files are like fancy notebooks with rows and columns. People use them everywhere—schools, offices, stores. They have names ending in .xlsx or .xls.

How to Read Excel Files

We use a special helper called readxl:

# First, get the helper
install.packages("readxl")
library(readxl)

# Read your Excel file
my_data <- read_excel("students.xlsx")

# See what's inside
head(my_data)

Reading Specific Sheets

Excel files can have multiple pages (sheets). Like chapters in a book!

# Read the second sheet
grades <- read_excel(
  "school.xlsx",
  sheet = "Grades"
)

# Or use sheet number
grades <- read_excel(
  "school.xlsx",
  sheet = 2
)

Writing to Excel

To save your work back to Excel, use writexl:

install.packages("writexl")
library(writexl)

# Save your data
write_xlsx(my_data, "results.xlsx")

Simple Example:

  • You have a class list in Excel
  • R reads it, adds test scores
  • R saves a new Excel file with everything!

📋 JSON Operations

What is JSON?

JSON is like a labeled container system. Everything has a name and a value. Websites love JSON because it’s easy to read and send.

Think of it like this:
📦 Name: "Emma"
📦 Age: 10
📦 Hobbies: ["reading", "soccer"]

Reading JSON

install.packages("jsonlite")
library(jsonlite)

# From a file
pets <- fromJSON("pets.json")

# From a website
weather <- fromJSON(
  "https://api.weather.com/today"
)

Writing JSON

Turn your R data into JSON:

# Create simple data
my_info <- list(
  name = "Alex",
  age = 12,
  likes = c("pizza", "games")
)

# Convert to JSON text
json_text <- toJSON(my_info, pretty = TRUE)
print(json_text)

# Save to file
write_json(my_info, "alex.json")

Real Life Example:

  • Weather apps send JSON data
  • R reads it and shows you: “Today: 22°C, Sunny!”

🔤 Encoding Handling

What is Encoding?

Encoding is like a secret code for letters. Different countries use different codes. Sometimes words look like “café” but appear as “café” if the code is wrong!

Common Encodings

Encoding Use Case
UTF-8 Works everywhere (best choice!)
Latin-1 Old European files
Windows-1252 Old Windows files

Fixing Encoding Problems

# Read with correct encoding
data <- read_csv(
  "french_menu.csv",
  locale = locale(encoding = "UTF-8")
)

# Or try Latin-1 for old files
data <- read_csv(
  "old_file.csv",
  locale = locale(encoding = "Latin1")
)

Checking and Converting

# Check what encoding a file uses
guess_encoding("mystery_file.csv")

# Convert text encoding
library(stringi)
fixed_text <- stri_encode(
  weird_text,
  from = "Latin1",
  to = "UTF-8"
)

Think of it like:

  • French uses é, German uses ü, Japanese uses 日本
  • UTF-8 is a universal translator that understands ALL!

🗄️ Database Connections

What is a Database?

A database is like a giant organized warehouse for data. Instead of thousands of Excel files, everything lives in one smart system.

Connecting to Databases

install.packages("DBI")
install.packages("RSQLite")  # For SQLite
library(DBI)

# Connect to a database
con <- dbConnect(
  RSQLite::SQLite(),
  "my_database.sqlite"
)

Other Database Types

# PostgreSQL
install.packages("RPostgres")
con <- dbConnect(
  RPostgres::Postgres(),
  dbname = "shop",
  host = "localhost",
  user = "admin",
  password = "secret123"
)

# MySQL
install.packages("RMySQL")
con <- dbConnect(
  RMySQL::MySQL(),
  dbname = "school",
  host = "server.com",
  user = "teacher",
  password = "pass123"
)

Basic Operations

# List all tables
dbListTables(con)

# Read entire table
students <- dbReadTable(con, "students")

# Always disconnect when done!
dbDisconnect(con)

Like a Library:

  • Connect = Get your library card
  • Tables = Different book sections
  • Disconnect = Return your card

🔍 SQL Queries in R

What is SQL?

SQL (say “sequel”) is the language databases understand. It’s like giving instructions to a librarian: “Find all books by Dr. Seuss published after 2000.”

Basic Queries

# Get all data from a table
result <- dbGetQuery(con,
  "SELECT * FROM students"
)

# Get specific columns
names <- dbGetQuery(con,
  "SELECT name, age FROM students"
)

# Filter with WHERE
teens <- dbGetQuery(con,
  "SELECT * FROM students
   WHERE age >= 13"
)

More Query Examples

# Sort results
sorted <- dbGetQuery(con,
  "SELECT name, score
   FROM students
   ORDER BY score DESC"
)

# Count records
count <- dbGetQuery(con,
  "SELECT COUNT(*) as total
   FROM students"
)

# Group and summarize
by_class <- dbGetQuery(con,
  "SELECT class,
          AVG(score) as avg_score
   FROM students
   GROUP BY class"
)

Using dplyr with Databases

The coolest part? Use dplyr commands on databases!

library(dplyr)
library(dbplyr)

# Connect to table
students_db <- tbl(con, "students")

# Use familiar dplyr!
smart_kids <- students_db %>%
  filter(score > 90) %>%
  select(name, score) %>%
  arrange(desc(score)) %>%
  collect()  # Bring to R

⚡ data.table Basics

What is data.table?

data.table is like R on a racing bike. When you have millions of rows, data.table zooms through them super fast!

Creating a data.table

install.packages("data.table")
library(data.table)

# Convert existing data
DT <- as.data.table(my_data)

# Read files fast
DT <- fread("huge_file.csv")  # Super fast!

The Magic Syntax

data.table uses: DT[i, j, by]

  • i = Which rows? (filter)
  • j = What to do? (select/calculate)
  • by = Group by what?
# Create example data
DT <- data.table(
  name = c("Ana", "Bob", "Ana", "Bob"),
  item = c("apple", "banana", "orange", "apple"),
  price = c(1.5, 0.8, 2.0, 1.5)
)

Filtering (the i part)

# Get Ana's purchases
DT[name == "Ana"]

# Get items over $1
DT[price > 1]

Selecting & Calculating (the j part)

# Select columns
DT[, .(name, price)]

# Calculate
DT[, .(total = sum(price))]

# Create new column
DT[, expensive := price > 1]

Grouping (the by part)

# Total per person
DT[, .(total = sum(price)), by = name]

# Count per person
DT[, .N, by = name]

Why data.table is Amazing

Task data.frame data.table
Read 1GB file 60 seconds 5 seconds
Group by 30 seconds 1 second
Memory used 8 GB 2 GB

Think of it like:

  • Regular R = Walking
  • data.table = Sports car

🗺️ Putting It All Together

graph TD A["Outside World"] --> B{Data Source} B --> C["Excel Files"] B --> D["JSON/APIs"] B --> E["Databases"] C --> F["readxl"] D --> G["jsonlite"] E --> H["DBI + SQL"] F --> I["R Data Frame"] G --> I H --> I I --> J["Analyze with Tidyverse"] I --> K["Speed up with data.table"]

🌟 Quick Reference

Task Package Main Function
Read Excel readxl read_excel()
Write Excel writexl write_xlsx()
Read JSON jsonlite fromJSON()
Write JSON jsonlite toJSON()
Fix encoding readr locale(encoding=)
Connect DB DBI dbConnect()
Query DB DBI dbGetQuery()
Fast data data.table fread(), DT[i,j,by]

🎉 You Did It!

Now you can:

  • ✅ Read and write Excel files
  • ✅ Work with JSON data from anywhere
  • ✅ Handle tricky text encodings
  • ✅ Connect to real databases
  • ✅ Write SQL queries in R
  • ✅ Use data.table for lightning speed

You’re no longer limited to data inside R. The whole world of data is now yours to explore! 🚀

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.