#Memory: How LLMs Remember

LLMs don't have persistent memory by default. Every conversation starts fresh. So how do we build systems that remember?

This page explores the memory challenge in AI systems and strategies to solve it.

#The Memory Problem

Remember the context window from the fundamentals? That's all the memory an LLM has during a single request. Once you hit the limit, something has to go.

Watch what happens as a conversation grows. Try adding messages and see how different memory strategies handle the overflow:

Memory Visualization
Strategy:
Token Usage0 / 50 tokens

Click "Add Message" to simulate a conversation

Without a memory strategy, long conversations simply break. The model can't see the full history, so it loses context about what was discussed earlier.

#Memory Strategies

There are several approaches to handling memory in AI applications:

  • Sliding Window — Drop the oldest messages to stay within limits
  • Summarization — Compress old messages into a summary
  • Selective Memory — Keep only the most relevant past messages
  • External Memory — Store memories in a database and retrieve as needed
1// Sliding window example
2const maxMessages = 10;
3 
4function trimConversation(messages) {
5 if (messages.length > maxMessages) {
6 // Keep system message + recent messages
7 return [
8 messages[0], // System prompt
9 ...messages.slice(-maxMessages + 1)
10 ];
11 }
12 return messages;
13}

#Long-Term Memory

For truly persistent memory, we need to store information outside the conversation. This is where embeddings and vector databases come in.

The idea is simple: convert memories into vectors, store them, and retrieve the most relevant ones when needed. Try querying some stored memories:

Long-Term Memory Retrieval

User prefers dark mode

0%

User is building React apps

0%

User's name is Alex

0%

User works at a startup

0%

User likes TypeScript

0%

Long-term memory uses embeddings to find relevant past information based on your query. More relevant memories get higher scores.

This is the foundation of systems that can remember user preferences, past conversations, and learned information across sessions.

1import { embed } from 'ai';
2import { Index } from '@upstash/vector';
3 
4const index = new Index();
5 
6// Store a memory
7async function remember(content: string) {
8 const { embedding } = await embed({
9 model: 'openai/text-embedding-5',
10 value: content
11 });
12
13 await index.upsert({
14 id: crypto.randomUUID(),
15 vector: embedding,
16 metadata: { content }
17 });
18}
19 
20// Recall relevant memories
21async function recall(query: string, topK = 5) {
22 const { embedding } = await embed({
23 model: 'openai/text-embedding-5',
24 value: query
25 });
26
27 return index.query({ vector: embedding, topK });
28}

#Choosing a Strategy

The right memory strategy depends on your use case:

  • Chatbots — Sliding window works well for casual conversations
  • Personal Assistants — Long-term memory for user preferences
  • Customer Support — Summarization to preserve ticket context
  • Research Agents — External memory for discovered facts

Many production systems combine multiple strategies. A sliding window for recent context, plus long-term memory for important facts, creates a robust memory system.