#Memory: How LLMs Remember
LLMs don't have persistent memory by default. Every conversation starts fresh. So how do we build systems that remember?
This page explores the memory challenge in AI systems and strategies to solve it.
#The Memory Problem
Remember the context window from the fundamentals? That's all the memory an LLM has during a single request. Once you hit the limit, something has to go.
Watch what happens as a conversation grows. Try adding messages and see how different memory strategies handle the overflow:
Click "Add Message" to simulate a conversation
Without a memory strategy, long conversations simply break. The model can't see the full history, so it loses context about what was discussed earlier.
#Memory Strategies
There are several approaches to handling memory in AI applications:
- Sliding Window — Drop the oldest messages to stay within limits
- Summarization — Compress old messages into a summary
- Selective Memory — Keep only the most relevant past messages
- External Memory — Store memories in a database and retrieve as needed
1// Sliding window example2const maxMessages = 10;3 4function trimConversation(messages) {5 if (messages.length > maxMessages) {6 // Keep system message + recent messages7 return [8 messages[0], // System prompt9 ...messages.slice(-maxMessages + 1)10 ];11 }12 return messages;13}#Long-Term Memory
For truly persistent memory, we need to store information outside the conversation. This is where and vector databases come in.
The idea is simple: convert memories into vectors, store them, and retrieve the most relevant ones when needed. Try querying some stored memories:
User prefers dark mode
User is building React apps
User's name is Alex
User works at a startup
User likes TypeScript
Long-term memory uses embeddings to find relevant past information based on your query. More relevant memories get higher scores.
This is the foundation of systems that can remember user preferences, past conversations, and learned information across sessions.
1import { embed } from 'ai';2import { Index } from '@upstash/vector';3 4const index = new Index();5 6// Store a memory7async function remember(content: string) {8 const { embedding } = await embed({9 model: 'openai/text-embedding-5',10 value: content11 });12 13 await index.upsert({14 id: crypto.randomUUID(),15 vector: embedding,16 metadata: { content }17 });18}19 20// Recall relevant memories21async function recall(query: string, topK = 5) {22 const { embedding } = await embed({23 model: 'openai/text-embedding-5',24 value: query25 });26 27 return index.query({ vector: embedding, topK });28}#Choosing a Strategy
The right memory strategy depends on your use case:
- Chatbots — Sliding window works well for casual conversations
- Personal Assistants — Long-term memory for user preferences
- Customer Support — Summarization to preserve ticket context
- Research Agents — External memory for discovered facts
Many production systems combine multiple strategies. A sliding window for recent context, plus long-term memory for important facts, creates a robust memory system.