#Embeddings: Meaning as Numbers

How do we teach computers to understand that "happy" and "joyful" are similar, but "happy" and "sad" are opposites? The answer is embeddings.

Embeddings are the secret sauce behind semantic search, recommendations, and memory systems.

#What Are Embeddings?

An embedding is a list of numbers (a vector) that represents the "meaning" of text. Similar meanings produce similar vectors.

Imagine a space where every word has a position. Words with similar meanings are close together. Click on words below to explore this space:

2D Embedding Space Visualization

king

queen

man

woman

dog

cat

puppy

kitten

happy

joyful

sad

unhappy

Click on words to see their nearest neighbors. Words that are close together have similar meanings. Notice how "king" and "queen" are close, as are "dog" and "cat".

In reality, embeddings have hundreds or thousands of dimensions, not just two. But the principle is the same—distance equals similarity.

#Creating Embeddings

With the AI SDK, creating embeddings is straightforward:

1import { embed, embedMany } from 'ai';
2 
3// Single text
4const { embedding } = await embed({
5  model: 'openai/text-embedding-5',
6  value: 'The quick brown fox'
7});
8 
9// Multiple texts (more efficient)
10const { embeddings } = await embedMany({
11  model: 'openai/text-embedding-5',
12  values: [
13    'Hello world',
14    'Goodbye world',
15    'Hello there'
16  ]
17});

The returned embedding is an array of numbers (typically 4096 dimensions for OpenAI's latest models). You can then store these in a vector database for fast similarity search.

#Where Embeddings Shine

Semantic Search — Find documents by meaning, not just keywords
Recommendations — "People who liked X also liked Y"
Deduplication — Find near-duplicate content
Clustering — Group similar items automatically
RAG — Retrieve relevant context for LLM prompts

Speaking of RAG, that's exactly what we'll cover next—using embeddings to give LLMs access to your data.

Memory

RAG