Embeddings: How Vectors Get Meaning
Embeddings are the foundation of semantic understanding in AI. They turn words, sentences, and documents into numbers that capture meaning. This post explains how they work, why they matter, and how to use them.
The Intent & Action series
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings (this post) — How vectors capture meaning
- Intent Approaches — Seven ways to understand queries
- Building Hybrid Intent — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
What Are Embeddings?
An embedding is a vector representation of meaning. It converts text (or images, audio, etc.) into a list of numbers that captures semantic content.
"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, -0.45, ...]
These numbers aren't random. They're learned from massive text corpora, and they encode relationships between concepts.
Key insight: Similar meanings produce similar vectors.
"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, ...]
"machine learning" → [0.25, -0.12, 0.79, 0.03, ...] ← close
"cat photos" → [-0.41, 0.67, -0.22, 0.45, ...] ← distant
The first two vectors are near each other in vector space because the concepts are related. The third is far away because it's unrelated.
Why Vectors?
Why convert text to numbers? Because math works on numbers.
With text: How similar are "car" and "automobile"?
- String comparison: 0% match (different characters)
- No mathematical answer
With vectors: How similar are [0.3, 0.7, -0.2] and [0.35, 0.68, -0.18]?
- Cosine similarity: 0.99 (very similar)
- Mathematical answer
Embeddings let us do math on meaning:
- Find similar content (nearest neighbors)
- Cluster related concepts (k-means)
- Visualize relationships (dimensionality reduction)
- Combine meanings (vector arithmetic)
How Embeddings Work
The Training Process
Embedding models learn from context. Given massive text data, they learn that:
- Words appearing in similar contexts have similar meanings
- "The car drove fast" and "The automobile drove fast" → car ≈ automobile
- "King is to queen as man is to woman" → learned relationships
Modern models use transformers that process entire sentences, capturing:
- Word meaning in context ("bank" as river vs. financial)
- Sentence-level semantics
- Nuanced relationships
The Output
A trained model converts text to a fixed-size vector:
embed("The quick brown fox") → [0.12, -0.34, 0.56, ..., 0.78] // 384 numbers
embed("A fast auburn fox") → [0.14, -0.32, 0.54, ..., 0.76] // similar
embed("Database optimization") → [-0.45, 0.23, -0.12, ..., 0.34] // different
The vector size (dimensionality) is fixed per model:
all-MiniLM-L6-v2: 384 dimensionsall-mpnet-base-v2: 768 dimensionstext-embedding-ada-002: 1536 dimensions
More dimensions can capture more nuance, but require more compute/storage.
Visualizing Embeddings
Embeddings exist in high-dimensional space (384+ dimensions), but we can project them to 2D for visualization.
quadrantChart
title Semantic Clustering in 2D Space
x-axis Technology --> Food
y-axis General --> Specific
quadrant-1 Tech Concepts
quadrant-2 General Knowledge
quadrant-3 Food Topics
quadrant-4 Practical Skills
machine learning: [0.2, 0.7]
AI: [0.15, 0.6]
deep learning: [0.25, 0.75]
neural networks: [0.2, 0.8]
cooking recipes: [0.8, 0.6]
food preparation: [0.75, 0.5]
kitchen tips: [0.85, 0.55]
Related concepts cluster together. This is why semantic search works—you find content by proximity in meaning space.
Measuring Similarity
The standard measure is cosine similarity—the cosine of the angle between vectors.
function cosineSimilarity(a, b) {
let dot = 0,
normA = 0,
normB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}
Interpreting scores:
1.0= identical direction (same meaning)0.0= perpendicular (unrelated)-1.0= opposite direction (rare with text)
In practice:
> 0.8= highly similar0.5 - 0.8= related< 0.5= loosely related or unrelated
Vector Arithmetic
One fascinating property: embeddings support semantic arithmetic.
king - man + woman ≈ queen
The model learned that "king" and "queen" have the same relationship as "man" and "woman".
Paris - France + Germany ≈ Berlin
Capital city relationships are encoded in the vector space.
This enables analogical reasoning and relationship discovery.
Embedding Models
Evolution
Word2Vec (2013): Word-level embeddings. "bank" has one vector regardless of context.
GloVe (2014): Global statistics for better word representations.
ELMo (2018): Contextual embeddings. "bank" differs in "river bank" vs "bank account".
BERT (2018): Transformer-based, bidirectional context. Major quality improvement.
Sentence Transformers (2019): Optimized for sentence/paragraph embeddings, not just words.
Modern Models (2022+): Instruction-tuned, multilingual, domain-specific variants.
Current Options
| Model | Dimensions | Use Case | Access |
|---|---|---|---|
all-MiniLM-L6-v2 |
384 | Fast, good quality | Open (transformers.js) |
all-mpnet-base-v2 |
768 | Better quality | Open |
text-embedding-ada-002 |
1536 | Production, high quality | OpenAI API |
text-embedding-3-small |
1536 | Production, cheaper | OpenAI API |
text-embedding-3-large |
3072 | Best quality | OpenAI API |
voyage-2 |
1024 | Retrieval-optimized | Voyage API |
For client-side: all-MiniLM-L6-v2 balances size and quality.
For server-side: Use larger models or APIs.
Generating Embeddings
Browser (transformers.js)
import { pipeline } from "@xenova/transformers";
// Load model (downloads ~30MB first time)
const extractor = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
);
// Generate embedding
const output = await extractor("How do embeddings work?", {
pooling: "mean",
normalize: true,
});
const embedding = Array.from(output.data);
// [0.023, -0.145, 0.821, ..., 0.034] // 384 numbers
Node.js
import { pipeline } from "@xenova/transformers";
const extractor = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
);
async function embed(text) {
const output = await extractor(text, {
pooling: "mean",
normalize: true,
});
return Array.from(output.data);
}
const vec = await embed("Understanding embeddings");
OpenAI API
import OpenAI from "openai";
const openai = new OpenAI();
async function embed(text) {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
return response.data[0].embedding;
}
When to Generate Embeddings
Embeddings can be generated at different points in your system's lifecycle. Each approach has trade-offs.
Build Time
Generate embeddings during site/app build. Store as static files.
flowchart TD
A[Content<br/>Markdown, DB]:::primary -->|build script| B[embeddings.json]:::secondary
B -->|deploy| C[Static asset<br/>served to clients]:::accent
Example: This blog generates embeddings at build time via scripts/generate-embeddings.js.
Pros:
- Zero runtime compute for content embeddings
- Works with static hosting (no server needed)
- Embeddings available instantly on page load
Cons:
- Content changes require rebuild + redeploy
- Build time increases with content volume
- Can't embed user-generated content
Best for: Static sites, documentation, blogs, known content sets.
Ingest Time
Generate embeddings when content is added/updated. Store in database or vector store.
flowchart TD
A[New content submitted]:::primary -->|webhook/queue| B[Embedding service]:::secondary
B --> C[Vector database<br/>Pinecone, Weaviate, etc.]:::accent
Pros:
- Content immediately searchable after ingest
- Handles dynamic/user-generated content
- Scales with queue workers
Cons:
- Requires server infrastructure
- Ingest latency (seconds to process)
- Vector DB costs at scale
Best for: CMS platforms, knowledge bases, support systems.
Query Time
Generate embeddings on-demand when a query arrives.
flowchart LR
A[User query]:::primary -->|runtime| B[Embed query]:::secondary
B --> C[Compare to stored embeddings]:::secondary
C --> D[Return results]:::accent
Pros:
- No pre-computation needed
- Always uses latest model
- Works for any ad-hoc text
Cons:
- Adds latency to every query
- Requires model in memory or API call
- Can't pre-compute content embeddings this way
Best for: Query embedding (you almost always embed queries at runtime).
Hybrid Approaches
Most production systems combine approaches:
| Content Type | When to Embed |
|---|---|
| Static content (docs, posts) | Build time |
| Dynamic content (user posts, tickets) | Ingest time |
| User queries | Query time |
Example architecture:
flowchart TB
subgraph BUILD["BUILD TIME"]
B1[Static content]:::primary --> B2[Embed]:::secondary --> B3[JSON/Vector DB]:::accent
end
subgraph INGEST["INGEST TIME"]
I1[New content webhook]:::primary --> I2[Embed]:::secondary --> I3[Update Vector DB]:::accent
end
subgraph QUERY["QUERY TIME"]
Q1[User query]:::primary --> Q2[Embed]:::secondary --> Q3[Search Vector DB]:::secondary --> Q4[Results]:::accent
end
BUILD --> INGEST
INGEST --> QUERY
Practical Considerations
Chunking
Long documents should be split into chunks before embedding:
function chunkText(text, maxLength = 500) {
const sentences = text.split(/[.!?]+/);
const chunks = [];
let current = "";
for (const sentence of sentences) {
if ((current + sentence).length > maxLength) {
if (current) chunks.push(current.trim());
current = sentence;
} else {
current += sentence + ". ";
}
}
if (current) chunks.push(current.trim());
return chunks;
}
Why chunk?
- Models have max input length (512 tokens typical)
- Smaller chunks = more precise retrieval
- But too small = lost context
Typical sizes: 200-500 tokens per chunk with 50-100 token overlap.
Normalization
Normalize vectors for consistent similarity scores:
function normalize(vec) {
const norm = Math.sqrt(vec.reduce((sum, v) => sum + v * v, 0));
return vec.map((v) => v / norm);
}
Most modern models normalize by default (use normalize: true).
Storage
Embeddings are just numbers. Store as:
- JSON: Simple, portable, human-readable
- Binary: Smaller, faster to load
- Vector DB: Optimized for similarity search (Pinecone, Weaviate, Chroma)
For small datasets (< 10,000 vectors), JSON is fine:
// Store
writeFileSync("embeddings.json", JSON.stringify(embeddings));
// Load
const embeddings = JSON.parse(readFileSync("embeddings.json"));
For large datasets, use a vector database with approximate nearest neighbor (ANN) algorithms.
Common Patterns
Semantic Search
async function search(query, documents, embeddings) {
const queryVec = await embed(query);
return Object.entries(embeddings)
.map(([id, vec]) => ({
id,
score: cosineSimilarity(queryVec, vec),
document: documents[id],
}))
.sort((a, b) => b.score - a.score)
.slice(0, 10);
}
Clustering
// Group similar documents
function cluster(embeddings, k = 5) {
// K-means or hierarchical clustering
// Returns: { clusterId: [docIds] }
}
Deduplication
// Find near-duplicates
function findDuplicates(embeddings, threshold = 0.95) {
const duplicates = [];
const ids = Object.keys(embeddings);
for (let i = 0; i < ids.length; i++) {
for (let j = i + 1; j < ids.length; j++) {
const sim = cosineSimilarity(embeddings[ids[i]], embeddings[ids[j]]);
if (sim > threshold) {
duplicates.push([ids[i], ids[j], sim]);
}
}
}
return duplicates;
}
Classification
// Zero-shot classification via similarity to labels
async function classify(text, labels) {
const textVec = await embed(text);
const labelVecs = await Promise.all(labels.map(embed));
const scores = labelVecs.map((vec, i) => ({
label: labels[i],
score: cosineSimilarity(textVec, vec),
}));
return scores.sort((a, b) => b.score - a.score);
}
// Usage
classify("The stock market crashed today", [
"finance",
"sports",
"technology",
"politics",
]);
// → [{ label: "finance", score: 0.82 }, ...]
Limitations
What Embeddings Don't Capture
Negation: "I love this" and "I don't love this" may be similar (both about loving).
Specificity: "The meeting is at 3pm Tuesday" loses the exact time in vector form.
Reasoning: Embeddings capture similarity, not logical relationships.
Recency: Training data has a cutoff; new terms may not embed well.
Failure Modes
Domain mismatch: General models may not understand specialized jargon.
Short queries: Single words have less context for disambiguation.
Adversarial inputs: Carefully crafted text can produce misleading embeddings.
Key Takeaways
- Embeddings are vectors that capture meaning — similar concepts, similar vectors
- Math on meaning — enables similarity search, clustering, classification
- Context matters — modern models understand "bank" differently in different contexts
- Trade-offs exist — more dimensions = more nuance but more compute
- Chunk appropriately — 200-500 tokens for documents
- Normalize vectors — for consistent similarity scores
- Know the limits — embeddings don't capture negation, specifics, or reasoning
Series Navigation
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings (this post) — How vectors capture meaning
- Intent Approaches — Seven ways to understand queries
- Building Hybrid Intent — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
Related Posts
- MCP Dynamic Data Patterns — When to make retrieval dynamic
- RAG Flow Control — Using embeddings in production workflows
This blog generates embeddings at build time (scripts/generate-embeddings.js) but doesn't yet use them for search—that's on the roadmap. Currently: keyword search. Soon: semantic hybrid.