How does the KMP algorithm work?

KMP builds a failure table (LPS array) that lets it skip ahead when a mismatch occurs. It never goes backward in the text.

When should I use Rabin-Karp vs KMP?

Use KMP for finding one pattern with guaranteed O(n+m) time. Use Rabin-Karp for finding multiple patterns at once.

String Matching Algorithms | Python DSA Guide

Q: What is string matching?

String matching finds where a small pattern appears inside larger text. It's like finding a specific word in a book quickly.

🔍 String Matching: Finding Needles in Haystacks

The Detective Story

Imagine you’re a detective with a magnifying glass looking for a specific word hidden in a giant book. You could check every single letter, one by one… but that would take forever! What if you had magical shortcuts that helped you skip ahead when you knew the word couldn’t possibly be there?

That’s exactly what string matching algorithms do. They’re like clever detectives who find patterns in text — super fast!

🧵 What is String Matching?

String matching means finding where a small piece of text (called the pattern) appears inside a bigger piece of text (called the text).

Simple Example:

Text: “I love eating bananas and apples”
Pattern: “banana”
Result: Found at position 14!

Think of it like finding your friend in a crowd. You know what they look like (the pattern), and you scan through people (the text) until you spot them.

🐌 The Slow Way (Brute Force)

Before we learn the magic tricks, let’s see the simple way:

def simple_search(text, pattern):
    n = len(text)
    m = len(pattern)
    for i in range(n - m + 1):
        match = True
        for j in range(m):
            if text[i + j] != pattern[j]:
                match = False
                break
        if match:
            return i
    return -1

Problem: If the text has 1 million letters and pattern has 1000 letters, we might check 1 billion comparisons! 😰

🚀 KMP Algorithm: The Smart Skipper

The Story

Imagine you’re looking for the word “ABABC” in a long text.

You start matching: A-B-A-B… but then the next letter is wrong! The slow detective would go back to the start and try again from the next position.

But the KMP detective is smarter! They think: “Wait, I just saw A-B at the end of what I matched. That could be the BEGINNING of my pattern! I don’t need to start completely over!”

How KMP Works

Step 1: Build a “Failure Table” (also called LPS array)

This table tells us: “If I fail at position X, where should I jump back to?”

For pattern “ABABC”:

Pattern:  A  B  A  B  C
Index:    0  1  2  3  4
LPS:      0  0  1  2  0

What does this mean?

At position 3 (second B), LPS = 2
This means: the first 2 characters (A-B) match the last 2 we just saw!
So if we fail after matching “ABAB”, we can pretend we already matched “AB”!

Building the LPS Table

def build_lps(pattern):
    m = len(pattern)
    lps = [0] * m
    length = 0  # Length of previous
                # longest prefix suffix
    i = 1

    while i < m:
        if pattern[i] == pattern[length]:
            length += 1
            lps[i] = length
            i += 1
        else:
            if length != 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
    return lps

The KMP Search

def kmp_search(text, pattern):
    n, m = len(text), len(pattern)
    lps = build_lps(pattern)

    i = 0  # Text pointer
    j = 0  # Pattern pointer

    while i < n:
        if text[i] == pattern[j]:
            i += 1
            j += 1
            if j == m:
                return i - j  # Found!
        else:
            if j != 0:
                j = lps[j - 1]  # Smart skip!
            else:
                i += 1
    return -1

Why KMP is Magic

graph TD
    A["Start Matching"] --> B{Characters Match?}
    B -->|Yes| C["Move Both Pointers"]
    C --> D{Pattern Complete?}
    D -->|Yes| E["🎉 Found It!"]
    D -->|No| B
    B -->|No| F{Pattern Pointer = 0?}
    F -->|Yes| G["Move Text Pointer"]
    F -->|No| H["Jump Using LPS Table"]
    H --> B
    G --> B

KMP never goes backward in the text! It only moves forward, making it very fast.

Time: O(n + m) — blazing fast! ⚡

🎲 Rabin-Karp: The Fingerprint Detective

The Story

Imagine every word has a unique fingerprint (a number). Instead of comparing letter by letter, you just compare fingerprints!

If the fingerprints don’t match → definitely not the same word. If they match → probably the same word (check to be sure).

The Hash Function

We turn letters into numbers using a clever formula:

# Simple hash: treat string as a number
# "ABC" → A×100 + B×10 + C×1

def calculate_hash(s, base=256, mod=101):
    h = 0
    for char in s:
        h = (h * base + ord(char)) % mod
    return h

The Rolling Hash Magic

Here’s the real trick! When we slide our window by one character, we don’t recalculate from scratch:

Old window: "ABC" → hash = 65×256² + 66×256 + 67
New window: "BCD"

Rolling update:
1. Remove 'A': subtract 65×256²
2. Shift left: multiply by 256
3. Add 'D': add 68

def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    base, mod = 256, 101

    # Calculate h = base^(m-1) % mod
    h = pow(base, m - 1, mod)

    # Calculate initial hashes
    p_hash = 0  # Pattern hash
    t_hash = 0  # Text window hash

    for i in range(m):
        p_hash = (base * p_hash +
                  ord(pattern[i])) % mod
        t_hash = (base * t_hash +
                  ord(text[i])) % mod

    # Slide the window
    for i in range(n - m + 1):
        if p_hash == t_hash:
            # Hash match! Verify actual match
            if text[i:i+m] == pattern:
                return i

        if i < n - m:
            # Roll the hash forward
            t_hash = (base * (t_hash -
                     ord(text[i]) * h) +
                     ord(text[i + m])) % mod
            if t_hash < 0:
                t_hash += mod

    return -1

Why Use Rabin-Karp?

graph TD
    A["Calculate Pattern Hash"] --> B["Calculate First Window Hash"]
    B --> C{Hashes Match?}
    C -->|No| D["Roll Hash Forward"]
    D --> E{More Windows?}
    E -->|Yes| C
    E -->|No| F["Not Found 😢"]
    C -->|Yes| G["Verify Characters"]
    G -->|Match| H["🎉 Found It!"]
    G -->|No Match| D

Best for: Finding MULTIPLE patterns at once!

Time: O(n + m) average, O(nm) worst case

🔗 Longest Common Prefix (LCP)

The Story

You have two words. How many letters at the START are the same?

“flower” and “flow” → “flow” (4 letters)
“dog” and “racecar” → “” (0 letters — nothing in common!)

Simple LCP for Two Strings

def lcp_two(s1, s2):
    i = 0
    while (i < len(s1) and
           i < len(s2) and
           s1[i] == s2[i]):
        i += 1
    return s1[:i]

# Example
print(lcp_two("flower", "flow"))
# Output: "flow"

LCP for Many Strings

What if you have a whole list of words?

def lcp_array(strings):
    if not strings:
        return ""

    # Start with first string
    prefix = strings[0]

    # Compare with each other string
    for s in strings[1:]:
        # Shrink prefix until it matches
        while not s.startswith(prefix):
            prefix = prefix[:-1]
            if not prefix:
                return ""

    return prefix

# Example
words = ["flower", "flow", "flight"]
print(lcp_array(words))
# Output: "fl"

Smarter LCP: Divide and Conquer

def lcp_divide_conquer(strings):
    if not strings:
        return ""

    def lcp_two(s1, s2):
        i = 0
        while (i < len(s1) and
               i < len(s2) and
               s1[i] == s2[i]):
            i += 1
        return s1[:i]

    def helper(left, right):
        if left == right:
            return strings[left]
        mid = (left + right) // 2
        lcp_left = helper(left, mid)
        lcp_right = helper(mid + 1, right)
        return lcp_two(lcp_left, lcp_right)

    return helper(0, len(strings) - 1)

LCP Visualization

graph TD
    A["flower, flow, flight"] --> B["flower, flow"]
    A --> C["flight"]
    B --> D["flower"]
    B --> E["flow"]
    D --> F["LCP: flow"]
    E --> F
    F --> G["LCP: fl"]
    C --> G
    G --> H["Final: fl"]

🎯 When to Use Each Algorithm?

Situation	Best Choice	Why
Find one pattern once	KMP	Guaranteed fast
Find multiple patterns	Rabin-Karp	Hash all at once
Check pattern start	LCP	Direct comparison
Pattern has repetition	KMP	LPS table helps
Need simple code	Rabin-Karp	Easy to understand

🌟 Real World Examples

KMP in Action

Find all “error” in log files — KMP scans millions of lines quickly
DNA sequence matching — Find gene patterns in chromosomes

Rabin-Karp in Action

Plagiarism detection — Check if document chunks appear elsewhere
Spam filtering — Match known spam phrases

LCP in Action

Autocomplete — All suggestions share common prefix with what you typed
File path matching — Find common directory root

🧠 Key Takeaways

KMP = Build a smart failure table, never go backward
Rabin-Karp = Use hash fingerprints for quick rejection
LCP = Find matching beginnings of strings

All three are about being clever — not checking everything, but knowing what to skip!

💪 You’ve Got This!

String matching might sound scary, but remember:

KMP is just a detective who remembers what they’ve seen
Rabin-Karp is a detective who uses fingerprints
LCP is just comparing beginnings

You’re now equipped with the tools to find ANY needle in ANY haystack! 🎉

String Matching

Unable to load concept

Coming Soon...

🔍 String Matching: Finding Needles in Haystacks

The Detective Story

🧵 What is String Matching?

🐌 The Slow Way (Brute Force)

🚀 KMP Algorithm: The Smart Skipper

The Story

How KMP Works

Building the LPS Table

The KMP Search

Why KMP is Magic

🎲 Rabin-Karp: The Fingerprint Detective

The Story

The Hash Function

The Rolling Hash Magic

Why Use Rabin-Karp?

🔗 Longest Common Prefix (LCP)

The Story

Simple LCP for Two Strings

LCP for Many Strings

Smarter LCP: Divide and Conquer

LCP Visualization

🎯 When to Use Each Algorithm?

🌟 Real World Examples

KMP in Action

Rabin-Karp in Action

LCP in Action

🧠 Key Takeaways

💪 You’ve Got This!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue