What is short-term memory in AI?

Short-term memory is like a whiteboard AI uses during a conversation. It holds recent info temporarily but gets erased when you start fresh.

What is the context window in AI?

The context window is like a screen size limit. AI can only see a limited amount of info at once - older messages scroll off.

What is token budget in AI?

Token budget is the spending limit for AI reading and writing. Every word costs tokens, so AI must spend wisely within its budget.

Memory Fundamentals | Agentic AI Guide

🧠 Memory Fundamentals in Agentic AI

The Story of the Brilliant Robot Secretary

Imagine you have a super smart robot secretary named Aria. Aria helps you with everything—answering questions, writing emails, remembering your favorite pizza toppings, and even helping you plan your birthday party!

But here’s the thing: Aria has different kinds of memory, just like you do. Let’s explore how Aria remembers things!

🎯 The Big Picture

Think of Aria’s brain like a busy office desk. There’s stuff right in front of her (what she’s working on NOW), sticky notes on the wall (quick reminders), filing cabinets (long-term storage), and a notepad for scribbling ideas.

graph LR
    A[🧠 Aria's Memory System] --> B["📋 Short-term Memory"]
    A --> C["📚 Long-term Memory"]
    A --> D["⚡ Working Memory"]
    A --> E["📝 Scratch Pad"]
    A --> F["💬 Conversation History"]
    B --> G["Context Window"]
    G --> H["Token Budget"]
    H --> I["Compression Strategies"]

📋 Short-term Memory

What Is It?

Short-term memory is like a small whiteboard that Aria uses during your current conversation. She can only write so much on it before it gets full!

Simple Example

You: “Hey Aria, my cat’s name is Whiskers.”

Aria remembers this while you’re chatting. But tomorrow? She might forget unless she writes it down somewhere permanent.

Real Life

When you ask a chatbot something, it remembers what you said a few messages ago. But if you start a brand new conversation, it starts fresh—like erasing the whiteboard!

Key Point: Short-term memory is temporary. It works great during a conversation but doesn’t last forever.

📚 Long-term Memory

What Is It?

Long-term memory is like a big filing cabinet where Aria stores important information forever (or at least for a very long time!).

Simple Example

Aria learns that you:

Love pepperoni pizza 🍕
Have a dog named Max 🐕
Hate waking up early 😴

She saves these facts in her filing cabinet. Next week, when you chat again, she still knows!

Real Life

Some AI assistants can remember your preferences across many conversations. Like how Spotify remembers you love rock music, or how Netflix knows you enjoy comedy movies.

Key Point: Long-term memory is persistent. It survives across conversations and sessions.

⚡ Working Memory

What Is It?

Working memory is like the space on your desk where you’re actively solving a problem. It’s not just remembering—it’s thinking and processing at the same time!

Simple Example

You ask: “What’s 15 + 27?”

Aria’s working memory:

Holds “15” and “27”
Performs the addition
Gives you “42”

She’s juggling numbers AND calculating—all at once!

Real Life

When you’re doing mental math, you’re using working memory. You hold the numbers in your head while you work out the answer.

Key Point: Working memory is for active thinking—holding AND processing information together.

📝 Scratch Pad

What Is It?

The scratch pad is like a piece of scrap paper where Aria jots down quick notes while solving complex problems.

Simple Example

You ask: “Plan a 3-course dinner for vegetarians.”

Aria’s scratch pad:

🥗 Appetizer ideas: soup, salad, bruschetta
🍝 Main course: pasta, curry, stir-fry
🍰 Dessert: cake, fruit, pudding

She scribbles ideas, crosses things out, and organizes before giving you the final answer.

Real Life

When you’re brainstorming on paper, you’re using a scratch pad. It’s messy, temporary, but super helpful for working through problems!

Key Point: The scratch pad is for temporary notes during problem-solving.

💬 Conversation History

What Is It?

Conversation history is like a chat transcript—a record of everything you and Aria have said to each other.

Simple Example

You: What's the weather today?
Aria: It's sunny, 25°C!
You: Should I bring a jacket?
Aria: No need—it's warm all day!

Aria looks at this history to understand “it” means “the weather” and answers correctly.

Real Life

When you scroll up in WhatsApp or iMessage, you’re looking at conversation history. Chatbots use this too!

Key Point: Conversation history provides context—it helps Aria understand what you’re talking about.

🪟 Context Window Management

What Is It?

The context window is like the screen size of Aria’s brain. She can only “see” a limited amount of information at once!

Simple Example

Imagine your phone screen can only show 10 text messages at a time. If your chat has 50 messages, you can only see the latest 10.

Aria works the same way! If your conversation is too long, older parts “scroll off” her screen.

Real Life

Ever had a chatbot “forget” something you said 20 messages ago? That’s the context window at work. The old stuff got pushed out!

graph TD
    A["Message 1"] --> B["Message 2"]
    B --> C["Message 3"]
    C --> D["..."]
    D --> E["Message 10"]
    E --> F["❌ Message 1 falls off!"]

Key Point: Context window is the limit on how much Aria can see at once.

💰 Token Budget Management

What Is It?

Tokens are like coins Aria uses to read and write. Every word costs tokens!

Simple Example

Your Question: “What is AI?” = ~4 tokens

Aria’s Answer: “AI stands for Artificial Intelligence. It’s technology that can learn and make decisions!” = ~15 tokens

Aria has a budget—maybe 4,000 tokens total. She needs to spend wisely!

Real Life

Text	Approximate Tokens
“Hello”	1 token
“How are you?”	4 tokens
A short paragraph	50-100 tokens
A full page	500-700 tokens

Key Point: Token budget is the spending limit on Aria’s reading and writing.

🗜️ Context Compression Strategies

What Is It?

When Aria’s context window gets full, she uses compression—smart ways to keep important info while removing less important stuff.

Simple Example

Original conversation (too long):

You: I want pizza
Aria: What toppings?
You: Pepperoni and mushrooms
Aria: What size?
You: Large
Aria: Delivery or pickup?
You: Delivery to 123 Main St

Compressed version:

Order: Large pepperoni + mushroom pizza
Delivery: 123 Main St

Same important info, way less space!

Compression Strategies

Strategy	How It Works	Example
Summarization	Turn long text into short summary	10 messages → 1 paragraph
Key Extraction	Keep only important facts	Names, dates, decisions
Forgetting Old Stuff	Drop earliest messages	Remove message 1 when adding message 11
Smart Chunking	Group related info together	All pizza preferences in one note

Real Life

When you take notes in class, you don’t write every word the teacher says. You summarize the key points. That’s compression!

Key Point: Compression strategies help Aria fit more important information in limited space.

🎬 Putting It All Together

Let’s see how all these memory types work together when you chat with Aria:

graph TD
    A["You ask a question"] --> B["Conversation History&lt;br/&gt;What we talked about"]
    B --> C["Context Window&lt;br/&gt;What Aria can see"]
    C --> D["Token Budget&lt;br/&gt;How much space left?"]
    D --> E{Budget OK?}
    E -->|Yes| F["Working Memory&lt;br/&gt;Think &amp; process"]
    E -->|No| G["Compression&lt;br/&gt;Squeeze info smaller"]
    G --> F
    F --> H["Scratch Pad&lt;br/&gt;Work out the answer"]
    H --> I["Short-term Memory&lt;br/&gt;Remember for now"]
    I --> J["Long-term Memory&lt;br/&gt;Save for later?"]
    J --> K["Aria responds! 🎉"]

🌟 Quick Summary

Memory Type	Like…	Duration	Purpose
Short-term	Whiteboard	This conversation	Hold recent info
Long-term	Filing cabinet	Forever	Store important facts
Working	Desk space	Right now	Think + process
Scratch Pad	Scrap paper	During task	Jot notes while working
Conversation History	Chat transcript	Session	Provide context
Context Window	Screen size	Per request	Limit what’s visible
Token Budget	Coin purse	Per request	Limit reading/writing
Compression	Note-taking	When needed	Fit more in less space

🚀 Why This Matters

Understanding memory helps you:

Write better prompts - Include important context, skip fluff
Know limitations - Chatbots forget! Remind them of key info
Work smarter - Use long conversations wisely
Appreciate AI - It’s juggling a lot in limited space!

🎯 Remember This!

AI memory is like your brain—it has limits!

Short stuff goes on the whiteboard. Important stuff goes in the filing cabinet. Active thinking happens on the desk. And when things get crowded, we compress!

You now understand how Agentic AI remembers, thinks, and manages its mental space. That’s pretty amazing! 🌟

Memory Fundamentals

Unable to load concept

Coming Soon...

🧠 Memory Fundamentals in Agentic AI

The Story of the Brilliant Robot Secretary

🎯 The Big Picture

📋 Short-term Memory

What Is It?

Simple Example

Real Life

📚 Long-term Memory

What Is It?

Simple Example

Real Life

⚡ Working Memory

What Is It?

Simple Example

Real Life

📝 Scratch Pad

What Is It?

Simple Example

Real Life

💬 Conversation History

What Is It?

Simple Example

Real Life

🪟 Context Window Management

What Is It?

Simple Example

Real Life

💰 Token Budget Management

What Is It?

Simple Example

Real Life

🗜️ Context Compression Strategies

What Is It?

Simple Example

Compression Strategies

Real Life

🎬 Putting It All Together

🌟 Quick Summary

🚀 Why This Matters

🎯 Remember This!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue