Embeddings are the foundation of semantic understanding in AI. They turn words, sentences, and documents into numbers that capture meaning. This post explains how they work, why they matter, and how to use them.

The Intent & Action series


What Are Embeddings?

An embedding is a vector representation of meaning. It converts text (or images, audio, etc.) into a list of numbers that captures semantic content.

"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, -0.45, ...]

These numbers aren't random. They're learned from massive text corpora, and they encode relationships between concepts.

Key insight: Similar meanings produce similar vectors.

"artificial intelligence" → [0.23, -0.14, 0.82, 0.01, ...]
"machine learning"        → [0.25, -0.12, 0.79, 0.03, ...]  ← close
"cat photos"              → [-0.41, 0.67, -0.22, 0.45, ...] ← distant

The first two vectors are near each other in vector space because the concepts are related. The third is far away because it's unrelated.


Why Vectors?

Why convert text to numbers? Because math works on numbers.

With text: How similar are "car" and "automobile"?

  • String comparison: 0% match (different characters)
  • No mathematical answer

With vectors: How similar are [0.3, 0.7, -0.2] and [0.35, 0.68, -0.18]?

  • Cosine similarity: 0.99 (very similar)
  • Mathematical answer

Embeddings let us do math on meaning:

  • Find similar content (nearest neighbors)
  • Cluster related concepts (k-means)
  • Visualize relationships (dimensionality reduction)
  • Combine meanings (vector arithmetic)

How Embeddings Work

The Training Process

Embedding models learn from context. Given massive text data, they learn that:

  • Words appearing in similar contexts have similar meanings
  • "The car drove fast" and "The automobile drove fast" → car ≈ automobile
  • "King is to queen as man is to woman" → learned relationships

Modern models use transformers that process entire sentences, capturing:

  • Word meaning in context ("bank" as river vs. financial)
  • Sentence-level semantics
  • Nuanced relationships

The Output

A trained model converts text to a fixed-size vector:

embed("The quick brown fox")[0.12, -0.34, 0.56, ..., 0.78]  // 384 numbers
embed("A fast auburn fox")[0.14, -0.32, 0.54, ..., 0.76]  // similar
embed("Database optimization")[-0.45, 0.23, -0.12, ..., 0.34] // different

The vector size (dimensionality) is fixed per model:

  • all-MiniLM-L6-v2: 384 dimensions
  • all-mpnet-base-v2: 768 dimensions
  • text-embedding-ada-002: 1536 dimensions

More dimensions can capture more nuance, but require more compute/storage.


Visualizing Embeddings

Embeddings exist in high-dimensional space (384+ dimensions), but we can project them to 2D for visualization.

quadrantChart
    title Semantic Clustering in 2D Space
    x-axis Technology --> Food
    y-axis General --> Specific
    quadrant-1 Tech Concepts
    quadrant-2 General Knowledge
    quadrant-3 Food Topics
    quadrant-4 Practical Skills
    machine learning: [0.2, 0.7]
    AI: [0.15, 0.6]
    deep learning: [0.25, 0.75]
    neural networks: [0.2, 0.8]
    cooking recipes: [0.8, 0.6]
    food preparation: [0.75, 0.5]
    kitchen tips: [0.85, 0.55]

Related concepts cluster together. This is why semantic search works—you find content by proximity in meaning space.


Measuring Similarity

The standard measure is cosine similarity—the cosine of the angle between vectors.

function cosineSimilarity(a, b) {
  let dot = 0,
    normA = 0,
    normB = 0;

  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }

  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

Interpreting scores:

  • 1.0 = identical direction (same meaning)
  • 0.0 = perpendicular (unrelated)
  • -1.0 = opposite direction (rare with text)

In practice:

  • > 0.8 = highly similar
  • 0.5 - 0.8 = related
  • < 0.5 = loosely related or unrelated

Vector Arithmetic

One fascinating property: embeddings support semantic arithmetic.

king - man + woman ≈ queen

The model learned that "king" and "queen" have the same relationship as "man" and "woman".

Paris - France + Germany ≈ Berlin

Capital city relationships are encoded in the vector space.

This enables analogical reasoning and relationship discovery.


Embedding Models

Evolution

Word2Vec (2013): Word-level embeddings. "bank" has one vector regardless of context.

GloVe (2014): Global statistics for better word representations.

ELMo (2018): Contextual embeddings. "bank" differs in "river bank" vs "bank account".

BERT (2018): Transformer-based, bidirectional context. Major quality improvement.

Sentence Transformers (2019): Optimized for sentence/paragraph embeddings, not just words.

Modern Models (2022+): Instruction-tuned, multilingual, domain-specific variants.

Current Options

Model Dimensions Use Case Access
all-MiniLM-L6-v2 384 Fast, good quality Open (transformers.js)
all-mpnet-base-v2 768 Better quality Open
text-embedding-ada-002 1536 Production, high quality OpenAI API
text-embedding-3-small 1536 Production, cheaper OpenAI API
text-embedding-3-large 3072 Best quality OpenAI API
voyage-2 1024 Retrieval-optimized Voyage API

For client-side: all-MiniLM-L6-v2 balances size and quality. For server-side: Use larger models or APIs.


Generating Embeddings

Browser (transformers.js)

import { pipeline } from "@xenova/transformers";

// Load model (downloads ~30MB first time)
const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

// Generate embedding
const output = await extractor("How do embeddings work?", {
  pooling: "mean",
  normalize: true,
});

const embedding = Array.from(output.data);
// [0.023, -0.145, 0.821, ..., 0.034]  // 384 numbers

Node.js

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

async function embed(text) {
  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(output.data);
}

const vec = await embed("Understanding embeddings");

OpenAI API

import OpenAI from "openai";

const openai = new OpenAI();

async function embed(text) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
  });
  return response.data[0].embedding;
}

When to Generate Embeddings

Embeddings can be generated at different points in your system's lifecycle. Each approach has trade-offs.

Build Time

Generate embeddings during site/app build. Store as static files.

flowchart TD
    A[Content<br/>Markdown, DB]:::primary -->|build script| B[embeddings.json]:::secondary
    B -->|deploy| C[Static asset<br/>served to clients]:::accent

Example: This blog generates embeddings at build time via scripts/generate-embeddings.js.

Pros:

  • Zero runtime compute for content embeddings
  • Works with static hosting (no server needed)
  • Embeddings available instantly on page load

Cons:

  • Content changes require rebuild + redeploy
  • Build time increases with content volume
  • Can't embed user-generated content

Best for: Static sites, documentation, blogs, known content sets.

Ingest Time

Generate embeddings when content is added/updated. Store in database or vector store.

flowchart TD
    A[New content submitted]:::primary -->|webhook/queue| B[Embedding service]:::secondary
    B --> C[Vector database<br/>Pinecone, Weaviate, etc.]:::accent

Pros:

  • Content immediately searchable after ingest
  • Handles dynamic/user-generated content
  • Scales with queue workers

Cons:

  • Requires server infrastructure
  • Ingest latency (seconds to process)
  • Vector DB costs at scale

Best for: CMS platforms, knowledge bases, support systems.

Query Time

Generate embeddings on-demand when a query arrives.

flowchart LR
    A[User query]:::primary -->|runtime| B[Embed query]:::secondary
    B --> C[Compare to stored embeddings]:::secondary
    C --> D[Return results]:::accent

Pros:

  • No pre-computation needed
  • Always uses latest model
  • Works for any ad-hoc text

Cons:

  • Adds latency to every query
  • Requires model in memory or API call
  • Can't pre-compute content embeddings this way

Best for: Query embedding (you almost always embed queries at runtime).

Hybrid Approaches

Most production systems combine approaches:

Content Type When to Embed
Static content (docs, posts) Build time
Dynamic content (user posts, tickets) Ingest time
User queries Query time

Example architecture:

flowchart TB
    subgraph BUILD["BUILD TIME"]
        B1[Static content]:::primary --> B2[Embed]:::secondary --> B3[JSON/Vector DB]:::accent
    end

    subgraph INGEST["INGEST TIME"]
        I1[New content webhook]:::primary --> I2[Embed]:::secondary --> I3[Update Vector DB]:::accent
    end

    subgraph QUERY["QUERY TIME"]
        Q1[User query]:::primary --> Q2[Embed]:::secondary --> Q3[Search Vector DB]:::secondary --> Q4[Results]:::accent
    end

    BUILD --> INGEST
    INGEST --> QUERY

Practical Considerations

Chunking

Long documents should be split into chunks before embedding:

function chunkText(text, maxLength = 500) {
  const sentences = text.split(/[.!?]+/);
  const chunks = [];
  let current = "";

  for (const sentence of sentences) {
    if ((current + sentence).length > maxLength) {
      if (current) chunks.push(current.trim());
      current = sentence;
    } else {
      current += sentence + ". ";
    }
  }
  if (current) chunks.push(current.trim());

  return chunks;
}

Why chunk?

  • Models have max input length (512 tokens typical)
  • Smaller chunks = more precise retrieval
  • But too small = lost context

Typical sizes: 200-500 tokens per chunk with 50-100 token overlap.

Normalization

Normalize vectors for consistent similarity scores:

function normalize(vec) {
  const norm = Math.sqrt(vec.reduce((sum, v) => sum + v * v, 0));
  return vec.map((v) => v / norm);
}

Most modern models normalize by default (use normalize: true).

Storage

Embeddings are just numbers. Store as:

  • JSON: Simple, portable, human-readable
  • Binary: Smaller, faster to load
  • Vector DB: Optimized for similarity search (Pinecone, Weaviate, Chroma)

For small datasets (< 10,000 vectors), JSON is fine:

// Store
writeFileSync("embeddings.json", JSON.stringify(embeddings));

// Load
const embeddings = JSON.parse(readFileSync("embeddings.json"));

For large datasets, use a vector database with approximate nearest neighbor (ANN) algorithms.


Common Patterns

Semantic Search

async function search(query, documents, embeddings) {
  const queryVec = await embed(query);

  return Object.entries(embeddings)
    .map(([id, vec]) => ({
      id,
      score: cosineSimilarity(queryVec, vec),
      document: documents[id],
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, 10);
}

Clustering

// Group similar documents
function cluster(embeddings, k = 5) {
  // K-means or hierarchical clustering
  // Returns: { clusterId: [docIds] }
}

Deduplication

// Find near-duplicates
function findDuplicates(embeddings, threshold = 0.95) {
  const duplicates = [];
  const ids = Object.keys(embeddings);

  for (let i = 0; i < ids.length; i++) {
    for (let j = i + 1; j < ids.length; j++) {
      const sim = cosineSimilarity(embeddings[ids[i]], embeddings[ids[j]]);
      if (sim > threshold) {
        duplicates.push([ids[i], ids[j], sim]);
      }
    }
  }

  return duplicates;
}

Classification

// Zero-shot classification via similarity to labels
async function classify(text, labels) {
  const textVec = await embed(text);
  const labelVecs = await Promise.all(labels.map(embed));

  const scores = labelVecs.map((vec, i) => ({
    label: labels[i],
    score: cosineSimilarity(textVec, vec),
  }));

  return scores.sort((a, b) => b.score - a.score);
}

// Usage
classify("The stock market crashed today", [
  "finance",
  "sports",
  "technology",
  "politics",
]);
// → [{ label: "finance", score: 0.82 }, ...]

Limitations

What Embeddings Don't Capture

Negation: "I love this" and "I don't love this" may be similar (both about loving).

Specificity: "The meeting is at 3pm Tuesday" loses the exact time in vector form.

Reasoning: Embeddings capture similarity, not logical relationships.

Recency: Training data has a cutoff; new terms may not embed well.

Failure Modes

Domain mismatch: General models may not understand specialized jargon.

Short queries: Single words have less context for disambiguation.

Adversarial inputs: Carefully crafted text can produce misleading embeddings.


Key Takeaways

  1. Embeddings are vectors that capture meaning — similar concepts, similar vectors
  2. Math on meaning — enables similarity search, clustering, classification
  3. Context matters — modern models understand "bank" differently in different contexts
  4. Trade-offs exist — more dimensions = more nuance but more compute
  5. Chunk appropriately — 200-500 tokens for documents
  6. Normalize vectors — for consistent similarity scores
  7. Know the limits — embeddings don't capture negation, specifics, or reasoning

Series Navigation

Related Posts


This blog generates embeddings at build time (scripts/generate-embeddings.js) but doesn't yet use them for search—that's on the roadmap. Currently: keyword search. Soon: semantic hybrid.