🎯 Structured Output in LangChain
The Magic Post Office Analogy
Imagine you’re running a magic post office. People send you messy, jumbled letters, and your job is to sort them into neat, labeled boxes. That’s exactly what Structured Output does in LangChain!
When an AI responds, it often gives you a big blob of text. But what if you need specific pieces of information in specific places? That’s where structured output comes in — it’s like training your AI to fill out a form instead of writing a free-form essay.
🔄 Output Parsing Strategies
What’s an Output Parser?
Think of it like a translator at the post office. The AI speaks in sentences, but your code speaks in data. The parser translates between them!
from langchain.output_parsers import (
StructuredOutputParser
)
# Define what boxes we need
parser = StructuredOutputParser.from_response_schemas([
{"name": "title", "description": "Book title"},
{"name": "author", "description": "Author name"}
])
Three Main Strategies
| Strategy | When to Use | Like… |
|---|---|---|
| Schema-based | You know exactly what you want | A form with blank fields |
| Regex-based | Simple patterns | Finding phone numbers |
| Function-based | Complex extraction | A smart assistant |
📦 Structured Output Patterns
Pattern 1: The Pydantic Model
This is the gold standard. You create a blueprint, and the AI fills it in!
from pydantic import BaseModel, Field
class MovieReview(BaseModel):
title: str = Field(
description="Movie name"
)
rating: int = Field(
description="Score 1-10"
)
summary: str = Field(
description="One sentence"
)
Why it works: It’s like giving someone a Mad Libs book. They can only put words where the blanks are!
Pattern 2: Dictionary Schema
Simpler, but less strict:
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
}
}
✨ The with_structured_output Method
This is the easiest way to get structured output. It’s like putting a mold on your AI’s mouth — whatever comes out fits perfectly!
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
# Create the magic!
llm = ChatOpenAI(model="gpt-4")
structured_llm = llm.with_structured_output(
Person
)
# Now it ALWAYS returns a Person
result = structured_llm.invoke(
"Tell me about Marie Curie"
)
print(result.name) # "Marie Curie"
print(result.age) # 66
Why This is Amazing
graph TD A[Messy AI Response] --> B[with_structured_output] B --> C[Clean Pydantic Object] C --> D[Easy to Use in Code!]
Before: “Marie Curie was born in 1867 and lived to be 66 years old…”
After: Person(name="Marie Curie", age=66)
🔧 JSON Mode
Sometimes you just want the AI to speak JSON. That’s it. No fancy objects, just clean JSON.
llm = ChatOpenAI(
model="gpt-4",
model_kwargs={
"response_format": {
"type": "json_object"
}
}
)
response = llm.invoke(
"List 3 fruits as JSON"
)
# {"fruits": ["apple", "banana", "cherry"]}
When to Use JSON Mode
| ✅ Use JSON Mode | ❌ Don’t Use |
|---|---|
| Simple data extraction | Complex nested objects |
| Quick prototyping | Strict validation needed |
| Flexible schemas | Type safety required |
🔧 Output Fixing Parser
The Problem: Sometimes the AI makes tiny mistakes. A missing quote. A wrong bracket. Your whole program crashes!
The Solution: The Output Fixing Parser — it’s like having a proofreader who can fix typos automatically.
from langchain.output_parsers import (
OutputFixingParser,
PydanticOutputParser
)
# Original parser
base_parser = PydanticOutputParser(
pydantic_object=Person
)
# Wrap it with auto-fix powers!
fixing_parser = OutputFixingParser.from_llm(
parser=base_parser,
llm=ChatOpenAI()
)
# Now it can fix small errors
bad_output = '{"name": "Alice", age: 25}'
# Missing quotes around "age"!
fixed = fixing_parser.parse(bad_output)
# Works anyway! Returns Person object
How It Works
graph TD A[Broken Output] --> B{Can Parse?} B -->|Yes| C[Return Result] B -->|No| D[Ask AI to Fix] D --> E[Try Again] E --> C
🔄 Retry Parser
What if the AI just… gets it completely wrong? Not a typo, but a fundamental misunderstanding?
The Retry Parser doesn’t just fix — it asks again with better instructions!
from langchain.output_parsers import (
RetryWithErrorOutputParser
)
retry_parser = RetryWithErrorOutputParser.from_llm(
parser=base_parser,
llm=ChatOpenAI()
)
# If parsing fails, it will:
# 1. Show the AI what went wrong
# 2. Ask it to try again
# 3. Parse the new response
The Difference
| Parser | What It Does | Best For |
|---|---|---|
| Fixing | Corrects small syntax errors | Typos, missing quotes |
| Retry | Re-asks with error feedback | Wrong structure entirely |
Think of it this way:
- Fixing Parser = A spell checker
- Retry Parser = A teacher saying “Try that answer again”
🛠️ Custom Output Parsers
Sometimes the built-in parsers don’t fit your needs. Time to build your own!
Creating a Custom Parser
from langchain.schema import BaseOutputParser
class EmotionParser(BaseOutputParser):
"""Extracts emotion from text."""
def parse(self, text: str) -> str:
# Your custom logic here
text_lower = text.lower()
if "happy" in text_lower:
return "😊 Happy"
elif "sad" in text_lower:
return "😢 Sad"
else:
return "😐 Neutral"
def get_format_instructions(self):
return "Express an emotion."
Using Your Custom Parser
parser = EmotionParser()
result = parser.parse(
"I'm so happy today!"
)
print(result) # "😊 Happy"
Real-World Example: Email Parser
class EmailParser(BaseOutputParser):
def parse(self, text: str) -> dict:
lines = text.strip().split("\n")
return {
"subject": lines[0] if lines else "",
"body": "\n".join(lines[1:])
}
🎯 Putting It All Together
Here’s the complete flow:
graph TD A[User Question] --> B[LLM] B --> C{Output Type?} C -->|Simple JSON| D[JSON Mode] C -->|Strict Schema| E[with_structured_output] C -->|Custom Logic| F[Custom Parser] D --> G[Parse Result] E --> G F --> G G --> H{Valid?} H -->|No, Minor Error| I[Output Fixing Parser] H -->|No, Major Error| J[Retry Parser] H -->|Yes| K[Use in App!] I --> G J --> B
🚀 Quick Reference
| Need | Solution | Code |
|---|---|---|
| Strict typing | with_structured_output(Model) |
Pydantic model |
| Just JSON | JSON mode | response_format |
| Fix typos | OutputFixingParser |
Wraps existing parser |
| Complete retry | RetryWithErrorOutputParser |
Re-invokes LLM |
| Special logic | Custom parser | Extend BaseOutputParser |
💡 Key Takeaways
- Structured output = AI filling out forms instead of writing essays
with_structured_outputis your best friend for type safety- JSON mode is quick and dirty when you just need JSON
- Fixing parser handles typos automatically
- Retry parser handles complete misunderstandings
- Custom parsers let you build anything you need
You’re now ready to make your AI responses predictable, parseable, and powerful! 🎉