Advanced Data Sources

Loading concept...

🚀 Advanced Data Sources in Pandas

The Data Explorer’s Treasure Map

Imagine you’re a treasure hunter. Your treasure? Data! But data doesn’t just sit in one place. It’s hidden in databases, compressed files, websites, and more. Today, you’ll learn how to find and grab data from 9 secret locations!

Think of Pandas as your magic backpack. It can open ANY treasure chest, no matter what lock it has.


🗄️ Reading from SQL Databases

What’s a SQL Database?

Think of a database like a super-organized filing cabinet. Each drawer (table) has folders (rows) with information. SQL is the language to ask the cabinet for what you need.

The Magic Spell

import pandas as pd
import sqlite3

# Open the filing cabinet
conn = sqlite3.connect('my_database.db')

# Ask for all student records
df = pd.read_sql('SELECT * FROM students', conn)

# Close the cabinet when done
conn.close()

What Just Happened?

  1. We connected to the database (opened the cabinet)
  2. We asked for data using SQL words
  3. Pandas put it in a nice table for us
  4. We closed the connection (good manners!)

💾 Writing to SQL Databases

Saving Your Treasure Back

Found new treasure? You can put it back in the filing cabinet too!

import pandas as pd
import sqlite3

# Your new data
new_students = pd.DataFrame({
    'name': ['Alice', 'Bob'],
    'age': [10, 11]
})

# Open cabinet and save
conn = sqlite3.connect('my_database.db')
new_students.to_sql('students', conn,
                     if_exists='append',
                     index=False)
conn.close()

The if_exists Options

Option What It Does
'fail' Stop if table exists
'replace' Delete old, add new
'append' Add to existing

📦 Reading Parquet Files

What’s Parquet?

Imagine squishing a huge sponge into a tiny box. Parquet does this to data! It makes files smaller AND faster to read.

The Magic Spell

import pandas as pd

# Read the compressed treasure
df = pd.read_parquet('data.parquet')

print(df.head())

Why Use Parquet?

  • 🚀 10x faster than CSV for big files
  • 📦 Files are smaller (saves space!)
  • 🎯 Reads only what you need

📤 Writing Parquet Files

Compress Your Treasure!

import pandas as pd

# Your data
df = pd.DataFrame({
    'product': ['Apple', 'Banana', 'Cherry'],
    'price': [1.20, 0.50, 3.00]
})

# Save as compressed parquet
df.to_parquet('products.parquet')

Pro Tip: Compression Options

# Extra squished!
df.to_parquet('products.parquet',
              compression='gzip')

🥒 Reading Pickle Files

What’s a Pickle?

No, not the green food! In Python, pickle means “save exactly as is.” Like freezing a moment in time.

The Magic Spell

import pandas as pd

# Unfreeze your data
df = pd.read_pickle('frozen_data.pkl')

print(df.head())

When to Use Pickle?

  • ✅ Saving complex data with special types
  • ✅ Quick save/load during work
  • ⚠️ Don’t share pickles from strangers!

🥒 Writing Pickle Files

Freeze Your Data!

import pandas as pd

# Your precious data
df = pd.DataFrame({
    'date': pd.to_datetime(['2024-01-01']),
    'value': [100]
})

# Freeze it for later
df.to_pickle('my_data.pkl')

Pickle Preserves Everything

Unlike CSV, pickle keeps:

  • Date formats
  • Number types
  • Index names
  • All the details!

🌐 Reading from URL

Data from the Internet!

Why download files when Pandas can read directly from the web?

The Magic Spell

import pandas as pd

# Read directly from website
url = 'https://example.com/data.csv'
df = pd.read_csv(url)

print(df.head())

Works With Many Formats!

# CSV from web
df = pd.read_csv('https://..../file.csv')

# Excel from web
df = pd.read_excel('https://..../file.xlsx')

# JSON from web
df = pd.read_json('https://..../data.json')

🌐 Reading HTML Tables

Grab Tables from Websites!

Ever seen a table on a website and wanted that data? Pandas can grab it!

The Magic Spell

import pandas as pd

# Get ALL tables from a webpage
url = 'https://en.wikipedia.org/wiki/...'
tables = pd.read_html(url)

# tables is a LIST of DataFrames
first_table = tables[0]
print(first_table.head())

What You Get

  • Returns a list of all tables found
  • Each table becomes a DataFrame
  • Use tables[0], tables[1], etc.

📚 Chunked Reading of Large Files

The Problem

What if your file is HUGE? Like trying to drink an ocean with one gulp!

The Solution: Take Small Sips!

import pandas as pd

# Read in small pieces (chunks)
chunks = pd.read_csv('huge_file.csv',
                      chunksize=10000)

# Process each piece
for chunk in chunks:
    # Do something with 10,000 rows
    print(f"Processing {len(chunk)} rows")

Why Chunks?

graph TD A[Huge File: 10 Million Rows] --> B[Chunk 1: 10,000 rows] A --> C[Chunk 2: 10,000 rows] A --> D[Chunk 3: 10,000 rows] A --> E[... 997 more chunks] B --> F[Process] C --> F D --> F E --> F F --> G[Combined Result]

Real Example: Counting Rows

total_rows = 0

for chunk in pd.read_csv('big.csv',
                          chunksize=5000):
    total_rows += len(chunk)

print(f"Total: {total_rows} rows")

🗺️ The Complete Treasure Map

graph LR A[Your Data] --> B{Where is it?} B -->|Database| C[read_sql] B -->|Parquet| D[read_parquet] B -->|Pickle| E[read_pickle] B -->|Website| F[read_csv/URL] B -->|HTML Table| G[read_html] B -->|Huge File| H[chunksize] I[Your DataFrame] --> J{Save where?} J -->|Database| K[to_sql] J -->|Parquet| L[to_parquet] J -->|Pickle| M[to_pickle]

🎯 Quick Reference Card

Source Read Write
SQL Database read_sql() to_sql()
Parquet File read_parquet() to_parquet()
Pickle File read_pickle() to_pickle()
URL read_csv(url)
HTML Tables read_html()
Large Files chunksize=N

🏆 You’re Now a Data Explorer!

You’ve learned 9 powerful skills:

  1. ✅ Reading SQL databases
  2. ✅ Writing to SQL databases
  3. ✅ Reading Parquet files
  4. ✅ Writing Parquet files
  5. ✅ Reading pickle files
  6. ✅ Writing pickle files
  7. ✅ Reading from URLs
  8. ✅ Reading HTML tables
  9. ✅ Chunked reading for big files

No matter where data hides, you can find it! 🎉

Loading story...

No Story Available

This concept doesn't have a story yet.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

Interactive Preview

Interactive - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Interactive Content

This concept doesn't have interactive content yet.

Cheatsheet Preview

Cheatsheet - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Cheatsheet Available

This concept doesn't have a cheatsheet yet.

Quiz Preview

Quiz - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.

No Quiz Available

This concept doesn't have a quiz yet.