#Embeddings: Meaning as Numbers
How do we teach computers to understand that "happy" and "joyful" are similar, but "happy" and "sad" are opposites? The answer is embeddings.
are the secret sauce behind semantic search, recommendations, and memory systems.
#What Are Embeddings?
An is a list of numbers (a vector) that represents the "meaning" of text. Similar meanings produce similar vectors.
Imagine a space where every word has a position. Words with similar meanings are close together. Click on words below to explore this space:
Click on words to see their nearest neighbors. Words that are close together have similar meanings. Notice how "king" and "queen" are close, as are "dog" and "cat".
In reality, embeddings have hundreds or thousands of dimensions, not just two. But the principle is the same—distance equals similarity.
#Creating Embeddings
With the AI SDK, creating embeddings is straightforward:
1import { embed, embedMany } from 'ai';2 3// Single text4const { embedding } = await embed({5 model: 'openai/text-embedding-5',6 value: 'The quick brown fox'7});8 9// Multiple texts (more efficient)10const { embeddings } = await embedMany({11 model: 'openai/text-embedding-5',12 values: [13 'Hello world',14 'Goodbye world',15 'Hello there'16 ]17});The returned is an array of numbers (typically 4096 dimensions for OpenAI's latest models). You can then store these in a vector database for fast similarity search.
#Where Embeddings Shine
- Semantic Search — Find documents by meaning, not just keywords
- Recommendations — "People who liked X also liked Y"
- Deduplication — Find near-duplicate content
- Clustering — Group similar items automatically
- RAG — Retrieve relevant context for LLM prompts
Speaking of RAG, that's exactly what we'll cover next—using embeddings to give LLMs access to your data.