Regular Expressions

Back

Loading concept...

πŸ” Regular Expressions: The Secret Code Finder

Imagine you have a magical magnifying glass that can find ANY pattern in a mountain of text. That’s regex!


🎭 The Story: Meet Detective Regex

You’re a detective. Your job? Finding specific patterns in huge piles of letters, emails, and documents.

Without regex, you’d read every single word. Boring!

With regex, you write a magic search spell andβ€”BOOMβ€”every match lights up instantly.

Let’s learn this superpower!


πŸ“š What We’ll Learn

  1. Regex Basics
  2. Pattern Matching Functions
  3. Match Objects and Groups
  4. Metacharacters
  5. Quantifiers and Anchors
  6. Greedy vs Non-Greedy
  7. Regex Flags

1️⃣ Regex Basics

What is Regex?

Regex = Regular Expression = A pattern you write to find text.

Think of it like a treasure map. The pattern is your map. The text is the jungle. Regex finds the treasure!

Your First Regex in Python

import re

text = "I love cats and dogs"
pattern = "cats"

result = re.search(pattern, text)
print(result)  # Found it!

What happened?

  • We imported the re module (Python’s regex tool)
  • We wrote a simple pattern: "cats"
  • re.search() found β€œcats” in our text

The r Prefix (Raw Strings)

Always use r before your pattern:

pattern = r"\d+"  # Good!
pattern = "\d+"   # Risky!

Why? The r tells Python: β€œDon’t mess with my backslashes!”


2️⃣ Pattern Matching Functions

Python gives us 4 main tools:

re.search() - Find First Match

import re

text = "Call me at 555-1234"
match = re.search(r"\d+", text)

if match:
    print(match.group())  # 555

Finds the first number in the text.

re.match() - Check the Beginning

text = "Hello World"

# This works (starts with Hello)
re.match(r"Hello", text)  # βœ“

# This fails (World is not at start)
re.match(r"World", text)  # βœ—

match() only looks at the beginning!

re.findall() - Find ALL Matches

text = "I have 2 cats and 3 dogs"
numbers = re.findall(r"\d", text)

print(numbers)  # ['2', '3']

Returns a list of all matches!

re.sub() - Find and Replace

text = "I hate Mondays"
new_text = re.sub(r"hate", "love", text)

print(new_text)  # I love Mondays

Like Find-Replace in your text editor!


3️⃣ Match Objects and Groups

What’s a Match Object?

When regex finds something, it creates a Match Objectβ€”a little package of info.

text = "My email is bob@mail.com"
match = re.search(r"\w+@\w+\.\w+", text)

if match:
    print(match.group())  # bob@mail.com
    print(match.start())  # 12 (where it starts)
    print(match.end())    # 24 (where it ends)
    print(match.span())   # (12, 24)

Groups: Capture Parts

Use parentheses () to capture pieces:

text = "Born on 2005-03-15"
pattern = r"(\d{4})-(\d{2})-(\d{2})"

match = re.search(pattern, text)

if match:
    print(match.group(0))  # 2005-03-15 (full)
    print(match.group(1))  # 2005 (year)
    print(match.group(2))  # 03 (month)
    print(match.group(3))  # 15 (day)

Think of it like boxes inside boxes!

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Full Match (group 0) β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”‚
β”‚  β”‚2005 β”‚ β”‚ 03 β”‚ β”‚ 15 β”‚ β”‚
β”‚  β”‚ (1) β”‚ β”‚(2) β”‚ β”‚(3) β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Named Groups

Give your groups names for clarity:

pattern = r"(?P<year>\d{4})-(?P<month>\d{2})"
match = re.search(pattern, "Date: 2024-08")

print(match.group('year'))   # 2024
print(match.group('month'))  # 08

4️⃣ Metacharacters

Metacharacters are magic symbols with special powers:

The Dot . - Match Any Character

re.findall(r"c.t", "cat cot cut")
# ['cat', 'cot', 'cut']

The . matches any single character!

Character Classes []

# Match a, e, i, o, or u
re.findall(r"[aeiou]", "hello")
# ['e', 'o']

# Match any digit
re.findall(r"[0-9]", "abc123")
# ['1', '2', '3']

Negation [^]

# Match anything EXCEPT vowels
re.findall(r"[^aeiou]", "hello")
# ['h', 'l', 'l']

Shorthand Classes

Symbol Meaning Same As
\d Any digit [0-9]
\D Not a digit [^0-9]
\w Word character [a-zA-Z0-9_]
\W Not word char [^a-zA-Z0-9_]
\s Whitespace [ \t\n\r]
\S Not whitespace [^ \t\n\r]
text = "Call 555-1234 now!"

re.findall(r"\d", text)  # ['5','5','5','1'...]
re.findall(r"\w+", text) # ['Call','555','1234','now']

The Pipe | - OR

re.findall(r"cat|dog", "I have a cat and dog")
# ['cat', 'dog']

5️⃣ Quantifiers and Anchors

Quantifiers: How Many?

Symbol Meaning Example
* 0 or more a* β†’ β€œβ€, β€œa”, β€œaaa”
+ 1 or more a+ β†’ β€œa”, β€œaaa”
? 0 or 1 a? β†’ β€œβ€, β€œa”
{n} Exactly n a{3} β†’ β€œaaa”
{n,} n or more a{2,} β†’ β€œaa”, β€œaaa”
{n,m} n to m a{2,4} β†’ β€œaa”, β€œaaa”
text = "goood morning gooooood day"

re.findall(r"go+d", text)
# ['goood', 'gooooood']

re.findall(r"go{2,4}d", text)
# ['goood'] (only 2-4 o's)

Anchors: Where to Look?

Symbol Meaning
^ Start of string
$ End of string
\b Word boundary
text = "hello world"

re.search(r"^hello", text)  # βœ“ Matches
re.search(r"^world", text)  # βœ— No match

re.search(r"worldquot;, text)  # βœ“ Matches
re.search(r"helloquot;, text)  # βœ— No match

Word Boundaries:

text = "cat category caterpillar"

re.findall(r"\bcat\b", text)
# ['cat'] - only the standalone word!

re.findall(r"cat", text)
# ['cat', 'cat', 'cat'] - all occurrences

6️⃣ Greedy vs Non-Greedy

The Hungry Monster (Greedy)

By default, regex is GREEDY. It wants as much as possible!

text = "<h1>Title</h1><p>Text</p>"

# Greedy (default)
re.findall(r"<.*>", text)
# ['<h1>Title</h1><p>Text</p>']
# Ate EVERYTHING between first < and last >

The Polite Monster (Non-Greedy)

Add ? after a quantifier to make it lazy:

# Non-greedy
re.findall(r"<.*?>", text)
# ['<h1>', '</h1>', '<p>', '</p>']
# Takes minimum needed!

Visual Comparison

Text: <b>bold</b>

Greedy  <.*>  : <────────────>
               <b>bold</b>

Lazy    <.*?> : <──>   <───>
               <b>   </b>

All Non-Greedy Versions

Greedy Non-Greedy
* *?
+ +?
? ??
{n,m} {n,m}?

7️⃣ Regex Flags

Flags change how your pattern works:

re.IGNORECASE (or re.I)

text = "Hello HELLO hello"

re.findall(r"hello", text)
# ['hello']

re.findall(r"hello", text, re.I)
# ['Hello', 'HELLO', 'hello']

re.MULTILINE (or re.M)

Makes ^ and $ work on each line:

text = """Line 1
Line 2
Line 3"""

re.findall(r"^Line", text)
# ['Line'] - only first line

re.findall(r"^Line", text, re.M)
# ['Line', 'Line', 'Line'] - all lines!

re.DOTALL (or re.S)

Makes . match newlines too:

text = "Hello\nWorld"

re.search(r"Hello.World", text)    # βœ— No match
re.search(r"Hello.World", text, re.S)  # βœ“ Match!

re.VERBOSE (or re.X)

Write readable patterns with comments:

pattern = r"""
    \d{3}    # Area code
    -        # Separator
    \d{4}    # Phone number
"""

re.search(pattern, "555-1234", re.X)

Combining Flags

Use the | operator:

re.findall(r"hello", text, re.I | re.M)

🏁 Quick Reference Flow

graph TD A["Start"] --> B{What do you need?} B --> C["Find first match"] C --> D["re.search"] B --> E["Check start only"] E --> F["re.match"] B --> G["Find all matches"] G --> H["re.findall"] B --> I["Replace text"] I --> J["re.sub"]

🎯 Real-World Examples

Validate an Email

pattern = r"^[\w.-]+@[\w.-]+\.\w+quot;

re.match(pattern, "user@email.com")  # βœ“
re.match(pattern, "bad-email")       # βœ—

Extract Phone Numbers

text = "Call 555-123-4567 or 999-876-5432"
pattern = r"\d{3}-\d{3}-\d{4}"

re.findall(pattern, text)
# ['555-123-4567', '999-876-5432']

Clean Extra Spaces

text = "Too   many    spaces"
clean = re.sub(r"\s+", " ", text)

print(clean)  # "Too many spaces"

🌟 You Did It!

You now have regex superpowers!

Remember:

  • πŸ” search() finds first
  • πŸ“‹ findall() finds all
  • πŸ”„ sub() replaces
  • πŸ“¦ Groups () capture parts
  • ⚑ Flags change behavior

Practice makes perfect. Try building patterns for:

  • URLs
  • Dates
  • Usernames
  • Hashtags

Happy pattern hunting! πŸŽ‰

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.