If you're a web developer, you've probably heard terms like LLM, tokens, context window, and agents thrown around. Maybe you've even used ChatGPT or built a simple chatbot. But how do these things actually work under the hood?
In this article, we're going to demystify these concepts. We'll start from the basics and work our way up to understanding how modern AI applications are built. And we'll do it with lots of interactive examples you can play with.
This article has visualizations that can be clicked and interacted with.
#What are Tokens?
When you send text to an LLM, it doesn't see letters or words the way we do. Instead, it breaks your text into pieces called tokens.
A token might be a whole word, part of a word, or even just a single character. Common words like "the" or "is" are usually single tokens, while uncommon words get split into smaller pieces.
Try typing something below to see how text gets broken into tokens:
Notice how common words stay whole, while longer words get split. The space character (âŁ) often attaches to the next word.
Why does this matter? Because LLMs have limits on how many tokens they can process at once. This limit is called the context window.
#The Context Window
Think of the context window as the model's short-term memory. Everything the model can "see" when generating a response must fit within this window.
Modern models have context windows ranging from 4,000 to over 1,000,000 tokens. But here's the key insight: everything counts toward this limitâyour system prompt, the conversation history, and the model's response.
Try increasing the history size. Notice how the response space shrinks? When there's no room left, older messages must be dropped.
When the context window fills up, older messages typically get dropped. This is why chatbots can "forget" things you told them earlier in a long conversation.
#Tools: Giving LLMs Superpowers
On their own, LLMs can only generate text. They can't check the weather, look up your calendar, or query a database. But we can give them tools.
A tool is just a function that the model can choose to call. You describe what the tool does, and the model decides when to use it based on the user's request.
Click a question to see how the model uses tools:
Here's what the code looks like with the AI SDK:
1import { generateText, tool } from 'ai';2import { z } from 'zod';3Â 4const result = await generateText({5 model: 'openai/gpt-5.2',6 prompt: 'What is the weather in Tokyo?',7 tools: {8 getWeather: tool({9 description: 'Get the weather for a location',10 parameters: z.object({11 location: z.string()12 }),13 execute: async ({ location }) => {14 // Call your weather API here15 return { temp: 22, condition: 'sunny' };16 }17 })18 }19});#The Agentic Loop
Now we get to the exciting part: agents. An agent is an LLM that uses tools in a loop to accomplish tasks.
The agentic loop works like this:
- You give the agent a task
- The model decides what action to take
- If it calls a tool, we execute it and feed the result back
- The model decides the next action
- Repeat until the task is complete
Click "Run Agent" to see the agentic loop in action
This is powerful because the agent can break down complex tasks into steps, use multiple tools, and adapt based on results. It's like having a junior developer who can think through problems and use APIs.
1import { ToolLoopAgent, stepCountIs, tool } from 'ai';2Â 3const researchAgent = new ToolLoopAgent({4 model: 'anthropic/claude-opus-4.5',5 tools: {6 search: tool({ /* ... */ }),7 readPage: tool({ /* ... */ }),8 saveNote: tool({ /* ... */ })9 },10 stopWhen: stepCountIs(10) // Safety limit11});12Â 13const result = await researchAgent.generate({14 prompt: 'Research the latest Next.js features'15});#Streaming: Real-time Responses
When you chat with an AI, you'll notice the response appears word by word rather than all at once. This is called streaming, and it's important for user experience.
Without streaming, users would stare at a loading spinner for several seconds. With streaming, they see progress immediately and can start reading while the response is still generating.
Click button below to start
Click button below to start
Notice how streaming shows content immediately while blocking makes you wait. Try both to feel the difference in user experience!
The AI SDK makes streaming easy with the streamText function:
1import { streamText } from 'ai';2Â 3const result = streamText({4 model: 'openai/gpt-5.2',5 prompt: 'Explain React hooks'6});7Â 8// In React, use the useChat hook:9const { messages, input, handleSubmit } = useChat();#Putting It All Together
Let's recap what we've learned:
- Tokens are how models see textânot as words, but as chunks
- The context window is the model's memory limit
- Tools give models the ability to take actions in the real world
- Agents use tools in a loop to accomplish complex tasks
- Streaming makes responses feel fast and responsive
With the AI SDK, you can build all of this in TypeScript. It provides a unified API that works with OpenAI, Anthropic, Google, and many other providersâjust by changing the model string.
1// Switch providers with a single line change:2model: 'openai/gpt-5.2'3model: 'anthropic/claude-opus-4.5'4model: 'google/gemini-3'This is just the beginning. Explore the other topics to dive deeper into memory, embeddings, RAG, orchestration, and evaluations.