Understanding LLMs, Agents & the AI SDK

If you're a web developer, you've probably heard terms like LLM, tokens, context window, and agents thrown around. Maybe you've even used ChatGPT or built a simple chatbot. But how do these things actually work under the hood?

In this article, we're going to demystify these concepts. We'll start from the basics and work our way up to understanding how modern AI applications are built. And we'll do it with lots of interactive examples you can play with.

This article has visualizations that can be clicked and interacted with.

#What are Tokens?

When you send text to an LLM, it doesn't see letters or words the way we do. Instead, it breaks your text into pieces called tokens.

A token might be a whole word, part of a word, or even just a single character. Common words like "the" or "is" are usually single tokens, while uncommon words get split into smaller pieces.

Try typing something below to see how text gets broken into tokens:

Type something to see tokens:

Tokens:0 tokens

Notice how common words stay whole, while longer words get split. The space character (␣) often attaches to the next word.

Why does this matter? Because LLMs have limits on how many tokens they can process at once. This limit is called the context window.

#The Context Window

Think of the context window as the model's short-term memory. Everything the model can "see" when generating a response must fit within this window.

Modern models have context windows ranging from 4,000 to over 1,000,000 tokens. But here's the key insight: everything counts toward this limit—your system prompt, the conversation history, and the model's response.

Context Window (16K tokens)

System Prompt: 3K

History: 4K

User Message: 2K

Response Space: 7K

System Prompt Size

Conversation History

Current Message

Try increasing the history size. Notice how the response space shrinks? When there's no room left, older messages must be dropped.

When the context window fills up, older messages typically get dropped. This is why chatbots can "forget" things you told them earlier in a long conversation.

#Tools: Giving LLMs Superpowers

On their own, LLMs can only generate text. They can't check the weather, look up your calendar, or query a database. But we can give them tools.

A tool is just a function that the model can choose to call. You describe what the tool does, and the model decides when to use it based on the user's request.

Available Tools:

getWeather

calculate

Click a question to see how the model uses tools:

Here's what the code looks like with the AI SDK:

1import { generateText, tool } from 'ai';
2import { z } from 'zod';
3 
4const result = await generateText({
5  model: 'openai/gpt-5.2',
6  prompt: 'What is the weather in Tokyo?',
7  tools: {
8    getWeather: tool({
9      description: 'Get the weather for a location',
10      parameters: z.object({
11        location: z.string()
12      }),
13      execute: async ({ location }) => {
14        // Call your weather API here
15        return { temp: 22, condition: 'sunny' };
16      }
17    })
18  }
19});

#The Agentic Loop

Now we get to the exciting part: agents. An agent is an LLM that uses tools in a loop to accomplish tasks.

The agentic loop works like this:

You give the agent a task
The model decides what action to take
If it calls a tool, we execute it and feed the result back
The model decides the next action
Repeat until the task is complete

Agent Loop Visualization

Think

→

Use Tool

Done

Click "Run Agent" to see the agentic loop in action

This is powerful because the agent can break down complex tasks into steps, use multiple tools, and adapt based on results. It's like having a junior developer who can think through problems and use APIs.

1import { ToolLoopAgent, stepCountIs, tool } from 'ai';
2 
3const researchAgent = new ToolLoopAgent({
4  model: 'anthropic/claude-opus-4.5',
5  tools: {
6    search: tool({ /* ... */ }),
7    readPage: tool({ /* ... */ }),
8    saveNote: tool({ /* ... */ })
9  },
10  stopWhen: stepCountIs(10) // Safety limit
11});
12 
13const result = await researchAgent.generate({
14  prompt: 'Research the latest Next.js features'
15});

#Streaming: Real-time Responses

When you chat with an AI, you'll notice the response appears word by word rather than all at once. This is called streaming, and it's important for user experience.

Without streaming, users would stare at a loading spinner for several seconds. With streaming, they see progress immediately and can start reading while the response is still generating.

Streaming vs. Blocking Comparison

Streaming

Click button below to start

Blocking

Click button below to start

Notice how streaming shows content immediately while blocking makes you wait. Try both to feel the difference in user experience!

The AI SDK makes streaming easy with the streamText function:

1import { streamText } from 'ai';
2 
3const result = streamText({
4  model: 'openai/gpt-5.2',
5  prompt: 'Explain React hooks'
6});
7 
8// In React, use the useChat hook:
9const { messages, input, handleSubmit } = useChat();

#Putting It All Together

Let's recap what we've learned:

Tokens are how models see text—not as words, but as chunks
The context window is the model's memory limit
Tools give models the ability to take actions in the real world
Agents use tools in a loop to accomplish complex tasks
Streaming makes responses feel fast and responsive

With the AI SDK, you can build all of this in TypeScript. It provides a unified API that works with OpenAI, Anthropic, Google, and many other providers—just by changing the model string.

1// Switch providers with a single line change:
2model: 'openai/gpt-5.2'
3model: 'anthropic/claude-opus-4.5'
4model: 'google/gemini-3'

This is just the beginning. Explore the other topics to dive deeper into memory, embeddings, RAG, orchestration, and evaluations.

Memory