#Embeddings: Meaning as Numbers

How do we teach computers to understand that "happy" and "joyful" are similar, but "happy" and "sad" are opposites? The answer is embeddings.

Embeddings are the secret sauce behind semantic search, recommendations, and memory systems.

#What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the "meaning" of text. Similar meanings produce similar vectors.

Imagine a space where every word has a position. Words with similar meanings are close together. Click on words below to explore this space:

2D Embedding Space Visualization
king
queen
man
woman
dog
cat
puppy
kitten
happy
joyful
sad
unhappy

Click on words to see their nearest neighbors. Words that are close together have similar meanings. Notice how "king" and "queen" are close, as are "dog" and "cat".

In reality, embeddings have hundreds or thousands of dimensions, not just two. But the principle is the same—distance equals similarity.

#Creating Embeddings

With the AI SDK, creating embeddings is straightforward:

1import { embed, embedMany } from 'ai';
2 
3// Single text
4const { embedding } = await embed({
5 model: 'openai/text-embedding-5',
6 value: 'The quick brown fox'
7});
8 
9// Multiple texts (more efficient)
10const { embeddings } = await embedMany({
11 model: 'openai/text-embedding-5',
12 values: [
13 'Hello world',
14 'Goodbye world',
15 'Hello there'
16 ]
17});

The returned embedding is an array of numbers (typically 4096 dimensions for OpenAI's latest models). You can then store these in a vector database for fast similarity search.

#Where Embeddings Shine

  • Semantic Search — Find documents by meaning, not just keywords
  • Recommendations — "People who liked X also liked Y"
  • Deduplication — Find near-duplicate content
  • Clustering — Group similar items automatically
  • RAG — Retrieve relevant context for LLM prompts

Speaking of RAG, that's exactly what we'll cover next—using embeddings to give LLMs access to your data.