Embeddings: How Vectors Get Meaning

Embeddings are the foundation of semantic understanding in AI. They turn words, sentences, and documents into numbers that capture meaning. This post explains how they work, why they matter, and how to use them.

The Intent & Action series

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings (this post) — How vectors capture meaning
Intent Approaches — Seven ways to understand queries
Building Hybrid Intent — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

What Are Embeddings?

An embedding is a vector representation of meaning. It converts text (or images, audio, etc.) into a list of numbers that captures semantic content.

"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, -0.45, ...]

These numbers aren't random. They're learned from massive text corpora, and they encode relationships between concepts.

Key insight: Similar meanings produce similar vectors.

"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, ...]
"machine learning"        → [0.25, -0.12, 0.79, 0.03, ...]  ← close
"cat photos"              → [-0.41, 0.67, -0.22, 0.45, ...] ← distant

The first two vectors are near each other in vector space because the concepts are related. The third is far away because it's unrelated.

Why Vectors?

Why convert text to numbers? Because math works on numbers.

With text: How similar are "car" and "automobile"?

String comparison: 0% match (different characters)
No mathematical answer

With vectors: How similar are [0.3, 0.7, -0.2] and [0.35, 0.68, -0.18]?

Cosine similarity: 0.99 (very similar)
Mathematical answer

Embeddings let us do math on meaning:

Find similar content (nearest neighbors)
Cluster related concepts (k-means)
Visualize relationships (dimensionality reduction)
Combine meanings (vector arithmetic)

How Embeddings Work

The Training Process

Embedding models learn from context. Given massive text data, they learn that:

Words appearing in similar contexts have similar meanings
"The car drove fast" and "The automobile drove fast" → car ≈ automobile
"King is to queen as man is to woman" → learned relationships

Modern models use transformers that process entire sentences, capturing:

Word meaning in context ("bank" as river vs. financial)
Sentence-level semantics
Nuanced relationships

The Output

A trained model converts text to a fixed-size vector:

embed("The quick brown fox") → [0.12, -0.34, 0.56, ..., 0.78]  // 384 numbers
embed("A fast auburn fox")   → [0.14, -0.32, 0.54, ..., 0.76]  // similar
embed("Database optimization") → [-0.45, 0.23, -0.12, ..., 0.34] // different

The vector size (dimensionality) is fixed per model:

all-MiniLM-L6-v2: 384 dimensions
all-mpnet-base-v2: 768 dimensions
text-embedding-ada-002: 1536 dimensions

More dimensions can capture more nuance, but require more compute/storage.

Visualizing Embeddings

Embeddings exist in high-dimensional space (384+ dimensions), but we can project them to 2D for visualization.

quadrantChart
    title Semantic Clustering in 2D Space
    x-axis Technology --> Food
    y-axis General --> Specific
    quadrant-1 Tech Concepts
    quadrant-2 General Knowledge
    quadrant-3 Food Topics
    quadrant-4 Practical Skills
    machine learning: [0.2, 0.7]
    AI: [0.15, 0.6]
    deep learning: [0.25, 0.75]
    neural networks: [0.2, 0.8]
    cooking recipes: [0.8, 0.6]
    food preparation: [0.75, 0.5]
    kitchen tips: [0.85, 0.55]

Related concepts cluster together. This is why semantic search works—you find content by proximity in meaning space.

Measuring Similarity

The standard measure is cosine similarity—the cosine of the angle between vectors.

function cosineSimilarity(a, b) {
  let dot = 0,
    normA = 0,
    normB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

Interpreting scores:

1.0 = identical direction (same meaning)
0.0 = perpendicular (unrelated)
-1.0 = opposite direction (rare with text)

In practice:

> 0.8 = highly similar
0.5 - 0.8 = related
< 0.5 = loosely related or unrelated

Vector Arithmetic

One fascinating property: embeddings support semantic arithmetic.

king - man + woman ≈ queen

The model learned that "king" and "queen" have the same relationship as "man" and "woman".

Paris - France + Germany ≈ Berlin

Capital city relationships are encoded in the vector space.

This enables analogical reasoning and relationship discovery.

Embedding Models

Evolution

Word2Vec (2013): Word-level embeddings. "bank" has one vector regardless of context.

GloVe (2014): Global statistics for better word representations.

ELMo (2018): Contextual embeddings. "bank" differs in "river bank" vs "bank account".

BERT (2018): Transformer-based, bidirectional context. Major quality improvement.

Sentence Transformers (2019): Optimized for sentence/paragraph embeddings, not just words.

Modern Models (2022+): Instruction-tuned, multilingual, domain-specific variants.

Current Options

Model	Dimensions	Use Case	Access
`all-MiniLM-L6-v2`	384	Fast, good quality	Open (transformers.js)
`all-mpnet-base-v2`	768	Better quality	Open
`text-embedding-ada-002`	1536	Production, high quality	OpenAI API
`text-embedding-3-small`	1536	Production, cheaper	OpenAI API
`text-embedding-3-large`	3072	Best quality	OpenAI API
`voyage-2`	1024	Retrieval-optimized	Voyage API

For client-side: all-MiniLM-L6-v2 balances size and quality. For server-side: Use larger models or APIs.

Generating Embeddings

Browser (transformers.js)

import { pipeline } from "@xenova/transformers";

// Load model (downloads ~30MB first time)
const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

// Generate embedding
const output = await extractor("How do embeddings work?", {
  pooling: "mean",
  normalize: true,
});

const embedding = Array.from(output.data);
// [0.023, -0.145, 0.821, ..., 0.034]  // 384 numbers

Node.js

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

async function embed(text) {
  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(output.data);
}

const vec = await embed("Understanding embeddings");

OpenAI API

import OpenAI from "openai";

const openai = new OpenAI();

async function embed(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return response.data[0].embedding;
}

When to Generate Embeddings

Embeddings can be generated at different points in your system's lifecycle. Each approach has trade-offs.

Build Time

Generate embeddings during site/app build. Store as static files.

flowchart TD
    A[Content<br/>Markdown, DB]:::primary -->|build script| B[embeddings.json]:::secondary
    B -->|deploy| C[Static asset<br/>served to clients]:::accent

Example: This blog generates embeddings at build time via scripts/generate-embeddings.js.

Pros:

Zero runtime compute for content embeddings
Works with static hosting (no server needed)
Embeddings available instantly on page load

Cons:

Content changes require rebuild + redeploy
Build time increases with content volume
Can't embed user-generated content

Best for: Static sites, documentation, blogs, known content sets.

Ingest Time

Generate embeddings when content is added/updated. Store in database or vector store.

flowchart TD
    A[New content submitted]:::primary -->|webhook/queue| B[Embedding service]:::secondary
    B --> C[Vector database<br/>Pinecone, Weaviate, etc.]:::accent

Pros:

Content immediately searchable after ingest
Handles dynamic/user-generated content
Scales with queue workers

Cons:

Requires server infrastructure
Ingest latency (seconds to process)
Vector DB costs at scale

Best for: CMS platforms, knowledge bases, support systems.

Query Time

Generate embeddings on-demand when a query arrives.

flowchart LR
    A[User query]:::primary -->|runtime| B[Embed query]:::secondary
    B --> C[Compare to stored embeddings]:::secondary
    C --> D[Return results]:::accent

Pros:

No pre-computation needed
Always uses latest model
Works for any ad-hoc text

Cons:

Adds latency to every query
Requires model in memory or API call
Can't pre-compute content embeddings this way

Best for: Query embedding (you almost always embed queries at runtime).

Hybrid Approaches

Most production systems combine approaches:

Content Type	When to Embed
Static content (docs, posts)	Build time
Dynamic content (user posts, tickets)	Ingest time
User queries	Query time

Example architecture:

flowchart TB
    subgraph BUILD["BUILD TIME"]
        B1[Static content]:::primary --> B2[Embed]:::secondary --> B3[JSON/Vector DB]:::accent
    end

    subgraph INGEST["INGEST TIME"]
        I1[New content webhook]:::primary --> I2[Embed]:::secondary --> I3[Update Vector DB]:::accent
    end

    subgraph QUERY["QUERY TIME"]
        Q1[User query]:::primary --> Q2[Embed]:::secondary --> Q3[Search Vector DB]:::secondary --> Q4[Results]:::accent
    end

    BUILD --> INGEST
    INGEST --> QUERY

Practical Considerations

Chunking

Long documents should be split into chunks before embedding:

function chunkText(text, maxLength = 500) {
  const sentences = text.split(/[.!?]+/);
  const chunks = [];
  let current = "";

  for (const sentence of sentences) {
    if ((current + sentence).length > maxLength) {
      if (current) chunks.push(current.trim());
      current = sentence;
    } else {
      current += sentence + ". ";
    }
  }
  if (current) chunks.push(current.trim());

  return chunks;
}

Why chunk?

Models have max input length (512 tokens typical)
Smaller chunks = more precise retrieval
But too small = lost context

Typical sizes: 200-500 tokens per chunk with 50-100 token overlap.

Normalization

Normalize vectors for consistent similarity scores:

function normalize(vec) {
  const norm = Math.sqrt(vec.reduce((sum, v) => sum + v * v, 0));
  return vec.map((v) => v / norm);
}

Most modern models normalize by default (use normalize: true).

Storage

Embeddings are just numbers. Store as:

JSON: Simple, portable, human-readable
Binary: Smaller, faster to load
Vector DB: Optimized for similarity search (Pinecone, Weaviate, Chroma)

For small datasets (< 10,000 vectors), JSON is fine:

// Store
writeFileSync("embeddings.json", JSON.stringify(embeddings));

// Load
const embeddings = JSON.parse(readFileSync("embeddings.json"));

For large datasets, use a vector database with approximate nearest neighbor (ANN) algorithms.

Common Patterns

Semantic Search

async function search(query, documents, embeddings) {
  const queryVec = await embed(query);

  return Object.entries(embeddings)
    .map(([id, vec]) => ({
      id,
      score: cosineSimilarity(queryVec, vec),
      document: documents[id],
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 10);
}

Clustering

// Group similar documents
function cluster(embeddings, k = 5) {
  // K-means or hierarchical clustering
  // Returns: { clusterId: [docIds] }
}

Deduplication

// Find near-duplicates
function findDuplicates(embeddings, threshold = 0.95) {
  const duplicates = [];
  const ids = Object.keys(embeddings);

  for (let i = 0; i < ids.length; i++) {
    for (let j = i + 1; j < ids.length; j++) {
      const sim = cosineSimilarity(embeddings[ids[i]], embeddings[ids[j]]);
      if (sim > threshold) {
        duplicates.push([ids[i], ids[j], sim]);
      }
    }
  }

  return duplicates;
}

Classification

// Zero-shot classification via similarity to labels
async function classify(text, labels) {
  const textVec = await embed(text);
  const labelVecs = await Promise.all(labels.map(embed));

  const scores = labelVecs.map((vec, i) => ({
    label: labels[i],
    score: cosineSimilarity(textVec, vec),
  }));

  return scores.sort((a, b) => b.score - a.score);
}

// Usage
classify("The stock market crashed today", [
  "finance",
  "sports",
  "technology",
  "politics",
]);
// → [{ label: "finance", score: 0.82 }, ...]

Limitations

What Embeddings Don't Capture

Negation: "I love this" and "I don't love this" may be similar (both about loving).

Specificity: "The meeting is at 3pm Tuesday" loses the exact time in vector form.

Reasoning: Embeddings capture similarity, not logical relationships.

Recency: Training data has a cutoff; new terms may not embed well.

Failure Modes

Domain mismatch: General models may not understand specialized jargon.

Short queries: Single words have less context for disambiguation.

Adversarial inputs: Carefully crafted text can produce misleading embeddings.

Key Takeaways

Embeddings are vectors that capture meaning — similar concepts, similar vectors
Math on meaning — enables similarity search, clustering, classification
Context matters — modern models understand "bank" differently in different contexts
Trade-offs exist — more dimensions = more nuance but more compute
Chunk appropriately — 200-500 tokens for documents
Normalize vectors — for consistent similarity scores
Know the limits — embeddings don't capture negation, specifics, or reasoning

Series Navigation

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings (this post) — How vectors capture meaning
Intent Approaches — Seven ways to understand queries
Building Hybrid Intent — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

MCP Dynamic Data Patterns — When to make retrieval dynamic
RAG Flow Control — Using embeddings in production workflows

This blog generates embeddings at build time (scripts/generate-embeddings.js) but doesn't yet use them for search—that's on the roadmap. Currently: keyword search. Soon: semantic hybrid.

Embeddings: How Vectors Get Meaning

What Are Embeddings?

Why Vectors?

How Embeddings Work

The Training Process

The Output

Visualizing Embeddings

Measuring Similarity

Vector Arithmetic

Embedding Models

Evolution

Current Options

Generating Embeddings

Browser (transformers.js)

Node.js

OpenAI API

When to Generate Embeddings

Build Time

Ingest Time

Query Time

Hybrid Approaches

Practical Considerations

Chunking

Normalization

Storage

Common Patterns

Semantic Search

Clustering

Deduplication

Classification

Limitations

What Embeddings Don't Capture

Failure Modes

Key Takeaways

Series Navigation

Related Posts

Content Calendar