A technical comparison of intent understanding approaches—from simple keyword matching to LLM-powered reasoning. Each approach has trade-offs in speed, accuracy, complexity, and capability.

The Intent & Action series


Try It: Live Comparison

Before diving into the technical details, try the different approaches yourself. For detailed performance analysis, see Search Performance: From Milliseconds to Seconds.

Syntactic String matching — no ML
Keyword JS
Fuzzy JS
Vector Search Semantic similarity via embeddings
Vector WASM
all-MiniLM-L6-v2 (~30MB)
LLM All content → LLM generates answer
Local WASM
Llama 3.2 1B (~2GB)
Remote API
GPT-4o-mini (OpenAI)
RAG Embeddings retrieve → LLM generates answer
Local WASM
MiniLM + Llama 3.2 (~2GB)
Remote WASM+API
MiniLM + GPT-4o-mini

Key observations:

  • Syntactic (Keyword/Fuzzy): Instant, no models, but can't understand meaning
  • Vector Search: ~30MB model, understands semantic similarity
  • LLM: Generates real answers — Local runs in browser, Remote uses OpenAI
  • RAG: Best of both — vector retrieval, then LLM generates contextual answers

All local models run entirely in your browser via WebAssembly — no API keys needed!


Two Dimensions: Approach and Runtime

Before diving in, let's separate two independent concerns:

Approachwhat algorithm understands intent:

  • Syntactic: Keyword, Fuzzy, Full-text (string matching)
  • Semantic: Embeddings, LLMs (meaning understanding)

Runtimewhere it executes:

  • JS: Pure JavaScript, instant, no download
  • WASM: Compiled models running locally in browser
  • Remote: API calls to servers

These dimensions are orthogonal:

Approach WASM (local) Remote (API)
Embeddings transformers.js OpenAI, Cohere
LLM WebLLM, llama.cpp Claude, GPT, Gemini

The demo above uses WASM for both Embeddings and LLM to show client-side performance characteristics. In production, you might use remote APIs for better models or WASM for privacy/offline.


The Approach Spectrum

Intent understanding spans a spectrum from simple to sophisticated:

flowchart LR
    subgraph Syntactic
        A[Keyword]:::secondary --> B[Fuzzy]:::secondary --> C[Full-text]:::secondary
    end
    subgraph Semantic
        D[Embeddings]:::accent --> E[LLM Rerank]:::accent --> F[RAG]:::accent --> G[Full LLM]:::accent
    end
    C --> D

Faster, simpler, less capable ← → Slower, complex, more capable

Each step adds capability but also complexity and latency.


Pre-LLM Approaches

These approaches predate large language models (or don't require them).

1. Keyword Search

Does the query string appear in the content?

function keywordSearch(query, documents) {
  const q = query.toLowerCase();
  return documents.filter(
    (doc) =>
      doc.title.toLowerCase().includes(q) ||
      doc.content.toLowerCase().includes(q),
  );
}
Aspect Value
Latency ~1ms
Accuracy Low (vocabulary mismatch fails)
Complexity Trivial
Resources Minimal

When to use: Small content sets, exact terminology, instant results required.

2. Fuzzy Matching

Handles typos via edit distance (Levenshtein, Jaro-Winkler).

import Fuse from "fuse.js";

const fuse = new Fuse(documents, {
  keys: ["title", "content"],
  threshold: 0.3, // 30% character difference allowed
  distance: 100,
});

const results = fuse.search(query);
Aspect Value
Latency ~5-10ms
Accuracy Low-Medium
Complexity Low
Resources Minimal

When to use: User input is error-prone, names have variant spellings.

3. Full-Text Search

Tokenization, stemming, TF-IDF or BM25 ranking.

import lunr from "lunr";

// Build index (once at startup)
const idx = lunr(function () {
  this.ref("slug");
  this.field("title", { boost: 10 });
  this.field("content");

  documents.forEach((doc) => this.add(doc));
});

// Search (per query)
const results = idx.search(query);
Aspect Value
Latency ~10-20ms
Accuracy Medium
Complexity Medium
Resources Index in memory

When to use: Medium-sized content, need ranking, want phrase handling.

4. Embedding-Based Search

Vector representations for semantic matching. See Understanding Embeddings for details.

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

async function embed(text) {
  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(output.data);
}

async function semanticSearch(query, embeddings) {
  const queryVec = await embed(query);

  return Object.entries(embeddings)
    .map(([id, vec]) => ({
      id,
      score: cosineSimilarity(queryVec, vec),
    }))
    .sort((a, b) => b.score - a.score);
}
Aspect Value
Latency ~50ms (warm), ~4s (cold)
Accuracy High
Complexity Medium-High
Resources Model in memory (~40MB)

When to use: Vocabulary mismatch is common, discovery over precision.


LLM Approaches

Large language models enable new capabilities beyond matching.

5. LLM Reranking

Use traditional retrieval, then rerank with an LLM.

async function rerankedSearch(query, documents, llm) {
  // Fast retrieval first (keyword or embedding)
  const candidates = keywordSearch(query, documents).slice(0, 20);

  // LLM reranking
  const response = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Rank these by relevance to "${query}":
${candidates.map((d, i) => `${i + 1}. ${d.title}: ${d.excerpt}`).join("\n")}

Return JSON array of numbers (original positions) in relevance order.`,
      },
    ],
  });

  const order = JSON.parse(response);
  return order.map((i) => candidates[i - 1]);
}
Aspect Value
Latency 500ms-1s
Accuracy High
Complexity Medium
Resources LLM API or local model

When to use: Want semantic understanding without full LLM latency on every query.

6. RAG (Retrieval-Augmented Generation)

Retrieve relevant context, then generate an answer.

async function ragSearch(query, documents, vectorDB, llm) {
  // Retrieve context
  const context = await vectorDB.search(query, { limit: 5 });

  // Generate answer
  const answer = await llm.chat({
    messages: [
      {
        role: "system",
        content: "Answer based only on the provided context.",
      },
      {
        role: "user",
        content: `Context:
${context.map((c) => c.content).join("\n\n")}

Question: ${query}`,
      },
    ],
  });

  return { answer, sources: context };
}
Aspect Value
Latency 1-3s
Accuracy Very High
Complexity High
Resources Vector DB + LLM

When to use: Users ask questions expecting answers, not links. See RAG Flow Control.

7. Full LLM Search

Pass everything to the model.

async function llmSearch(query, documents, llm) {
  const response = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Given these documents:
${JSON.stringify(
  documents.map((d) => ({
    title: d.title,
    excerpt: d.excerpt,
    slug: d.slug,
  })),
)}

Find documents relevant to: "${query}"
Explain why each is relevant.
Return as JSON: [{ slug, reason }]`,
      },
    ],
  });

  return JSON.parse(response);
}
Aspect Value
Latency 2-5s
Accuracy Highest
Complexity High
Resources LLM (context limits)

When to use: Complex queries, need explanation, content fits in context window.


Hybrid Approaches

Production systems typically combine approaches:

Syntactic + Semantic

Run both, merge results.

async function hybridSearch(query, documents, embeddings) {
  // Keyword (instant)
  const keywordResults = keywordSearch(query, documents).map((d, i) => ({
    ...d,
    keywordRank: i,
  }));

  // Semantic
  const semanticResults = await semanticSearch(query, embeddings);

  // Reciprocal Rank Fusion
  const scores = {};
  const k = 60;

  keywordResults.forEach((r, i) => {
    scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
  });

  semanticResults.forEach((r, i) => {
    scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
  });

  return Object.entries(scores)
    .sort(([, a], [, b]) => b - a)
    .map(([id]) => documents.find((d) => d.id === id));
}

Fast Retrieval + LLM Rerank

Get candidates quickly, refine with LLM.

async function smartSearch(query, documents, embeddings, llm) {
  // Fast semantic retrieval
  const candidates = await semanticSearch(query, embeddings);
  const top20 = candidates.slice(0, 20);

  // LLM rerank for precision
  const reranked = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Rank by relevance to "${query}": ${JSON.stringify(top20)}`,
      },
    ],
  });

  return JSON.parse(reranked);
}

Comparison Matrix

Approach Type Latency Accuracy Understands Meaning Reasoning
Keyword Syntactic ~1ms Low None
Fuzzy Syntactic ~5ms Low-Med None
Full-text Syntactic ~10ms Medium ❌ (stems only) None
Embeddings Semantic ~50ms High None
LLM Rerank Semantic ~500ms High Light
RAG Semantic ~2s Very High Yes
Full LLM Semantic ~3s Highest Full

Runtime Options

Where the approach runs affects what's practical:

Approach JS WASM Remote
Keyword ✅ Native
Fuzzy ✅ Fuse.js
Full-text ✅ Lunr ✅ Elastic
Embeddings ✅ transformers.js ✅ OpenAI/Cohere
LLM ✅ WebLLM ✅ Claude/GPT

Legend: ✅ Viable | — Not applicable

Tradeoffs by Runtime

JS WASM Remote
Cold Start None 4s–30s+ None
Download ~50KB 30MB–2GB 0
Query Speed <10ms 50ms–5s 200–500ms
Privacy
Offline
Model Quality N/A Good Best

Choose WASM when: Privacy matters, offline needed, acceptable cold start.

Choose Remote when: Best model quality needed, no cold start tolerance, data isn't sensitive.

The demo above uses WASM for both semantic approaches so you can experience the cold start and query latency tradeoffs directly.


Choosing an Approach

flowchart TD
    A{"Query volume?"}:::primary -->|Very high| B[Keyword or Full-text]:::accent
    A -->|Normal| C{"Users know exact terminology?"}:::primary
    C -->|Yes| D[Full-text + Fuzzy]:::accent
    C -->|No| E{"Ask questions?"}:::primary
    E -->|Yes| F[RAG]:::accent
    E -->|No| G[Embeddings or Hybrid]:::accent

Key Takeaways

  1. Start simple — Keyword search is often enough
  2. Add complexity for clear gains — Each step has cost
  3. Hybrid beats pure — Combine fast retrieval + smart ranking
  4. Match approach to query type — Keywords vs questions need different tools
  5. Location constrains options — Client-side limits what's practical

Series Navigation

Related Posts


Next: Part 4 shows how this blog implements hybrid intent handling.