Intent: From Keywords to LLMs

A technical comparison of intent understanding approaches—from simple keyword matching to LLM-powered reasoning. Each approach has trade-offs in speed, accuracy, complexity, and capability.

The Intent & Action series

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings — How vectors capture meaning
Intent Approaches (this post) — Seven ways to understand queries
Building Hybrid Intent — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

Try It: Live Comparison

Before diving into the technical details, try the different approaches yourself. For detailed performance analysis, see Search Performance: From Milliseconds to Seconds.

Syntactic String matching — no ML

Vector Search Semantic similarity via embeddings

all-MiniLM-L6-v2 (~30MB)

LLM All content → LLM generates answer

Llama 3.2 1B (~2GB)

GPT-4o-mini (OpenAI)

RAG Embeddings retrieve → LLM generates answer

MiniLM + Llama 3.2 (~2GB)

MiniLM + GPT-4o-mini

Key observations:

Syntactic (Keyword/Fuzzy): Instant, no models, but can't understand meaning
Vector Search: ~30MB model, understands semantic similarity
LLM: Generates real answers — Local runs in browser, Remote uses OpenAI
RAG: Best of both — vector retrieval, then LLM generates contextual answers

All local models run entirely in your browser via WebAssembly — no API keys needed!

Two Dimensions: Approach and Runtime

Before diving in, let's separate two independent concerns:

Approach — what algorithm understands intent:

Syntactic: Keyword, Fuzzy, Full-text (string matching)
Semantic: Embeddings, LLMs (meaning understanding)

Runtime — where it executes:

JS: Pure JavaScript, instant, no download
WASM: Compiled models running locally in browser
Remote: API calls to servers

These dimensions are orthogonal:

Approach	WASM (local)	Remote (API)
Embeddings	transformers.js	OpenAI, Cohere
LLM	WebLLM, llama.cpp	Claude, GPT, Gemini

The demo above uses WASM for both Embeddings and LLM to show client-side performance characteristics. In production, you might use remote APIs for better models or WASM for privacy/offline.

The Approach Spectrum

Intent understanding spans a spectrum from simple to sophisticated:

flowchart LR
    subgraph Syntactic
        A[Keyword]:::secondary --> B[Fuzzy]:::secondary --> C[Full-text]:::secondary
    end
    subgraph Semantic
        D[Embeddings]:::accent --> E[LLM Rerank]:::accent --> F[RAG]:::accent --> G[Full LLM]:::accent
    end
    C --> D

Faster, simpler, less capable ← → Slower, complex, more capable

Each step adds capability but also complexity and latency.

Pre-LLM Approaches

These approaches predate large language models (or don't require them).

1. Keyword Search

Does the query string appear in the content?

function keywordSearch(query, documents) {
  const q = query.toLowerCase();
  return documents.filter(
    (doc) =>
      doc.title.toLowerCase().includes(q) ||
      doc.content.toLowerCase().includes(q),
  );
}

Aspect	Value
Latency	~1ms
Accuracy	Low (vocabulary mismatch fails)
Complexity	Trivial
Resources	Minimal

When to use: Small content sets, exact terminology, instant results required.

2. Fuzzy Matching

Handles typos via edit distance (Levenshtein, Jaro-Winkler).

import Fuse from "fuse.js";

const fuse = new Fuse(documents, {
  keys: ["title", "content"],
  threshold: 0.3, // 30% character difference allowed
  distance: 100,
});

const results = fuse.search(query);

Aspect	Value
Latency	~5-10ms
Accuracy	Low-Medium
Complexity	Low
Resources	Minimal

When to use: User input is error-prone, names have variant spellings.

3. Full-Text Search

Tokenization, stemming, TF-IDF or BM25 ranking.

import lunr from "lunr";

// Build index (once at startup)
const idx = lunr(function () {
  this.ref("slug");
  this.field("title", { boost: 10 });
  this.field("content");

  documents.forEach((doc) => this.add(doc));
});

// Search (per query)
const results = idx.search(query);

Aspect	Value
Latency	~10-20ms
Accuracy	Medium
Complexity	Medium
Resources	Index in memory

When to use: Medium-sized content, need ranking, want phrase handling.

4. Embedding-Based Search

Vector representations for semantic matching. See Understanding Embeddings for details.

import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

async function embed(text) {
  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(output.data);
}

async function semanticSearch(query, embeddings) {
  const queryVec = await embed(query);

  return Object.entries(embeddings)
    .map(([id, vec]) => ({
      id,
      score: cosineSimilarity(queryVec, vec),
    }))
    .sort((a, b) => b.score - a.score);
}

Aspect	Value
Latency	~50ms (warm), ~4s (cold)
Accuracy	High
Complexity	Medium-High
Resources	Model in memory (~40MB)

When to use: Vocabulary mismatch is common, discovery over precision.

LLM Approaches

Large language models enable new capabilities beyond matching.

5. LLM Reranking

Use traditional retrieval, then rerank with an LLM.

async function rerankedSearch(query, documents, llm) {
  // Fast retrieval first (keyword or embedding)
  const candidates = keywordSearch(query, documents).slice(0, 20);

  // LLM reranking
  const response = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Rank these by relevance to "${query}":
${candidates.map((d, i) => `${i + 1}. ${d.title}: ${d.excerpt}`).join("\n")}

Return JSON array of numbers (original positions) in relevance order.`,
      },
    ],
  });

  const order = JSON.parse(response);
  return order.map((i) => candidates[i - 1]);
}

Aspect	Value
Latency	500ms-1s
Accuracy	High
Complexity	Medium
Resources	LLM API or local model

When to use: Want semantic understanding without full LLM latency on every query.

6. RAG (Retrieval-Augmented Generation)

Retrieve relevant context, then generate an answer.

async function ragSearch(query, documents, vectorDB, llm) {
  // Retrieve context
  const context = await vectorDB.search(query, { limit: 5 });

  // Generate answer
  const answer = await llm.chat({
    messages: [
      {
        role: "system",
        content: "Answer based only on the provided context.",
      },
      {
        role: "user",
        content: `Context:
${context.map((c) => c.content).join("\n\n")}

Question: ${query}`,
      },
    ],
  });

  return { answer, sources: context };
}

Aspect	Value
Latency	1-3s
Accuracy	Very High
Complexity	High
Resources	Vector DB + LLM

When to use: Users ask questions expecting answers, not links. See RAG Flow Control.

7. Full LLM Search

Pass everything to the model.

async function llmSearch(query, documents, llm) {
  const response = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Given these documents:
${JSON.stringify(
  documents.map((d) => ({
    title: d.title,
    excerpt: d.excerpt,
    slug: d.slug,
  })),
)}

Find documents relevant to: "${query}"
Explain why each is relevant.
Return as JSON: [{ slug, reason }]`,
      },
    ],
  });

  return JSON.parse(response);
}

Aspect	Value
Latency	2-5s
Accuracy	Highest
Complexity	High
Resources	LLM (context limits)

When to use: Complex queries, need explanation, content fits in context window.

Hybrid Approaches

Production systems typically combine approaches:

Syntactic + Semantic

Run both, merge results.

async function hybridSearch(query, documents, embeddings) {
  // Keyword (instant)
  const keywordResults = keywordSearch(query, documents).map((d, i) => ({
    ...d,
    keywordRank: i,
  }));

  // Semantic
  const semanticResults = await semanticSearch(query, embeddings);

  // Reciprocal Rank Fusion
  const scores = {};
  const k = 60;

  keywordResults.forEach((r, i) => {
    scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
  });

  semanticResults.forEach((r, i) => {
    scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
  });

  return Object.entries(scores)
    .sort(([, a], [, b]) => b - a)
    .map(([id]) => documents.find((d) => d.id === id));
}

Fast Retrieval + LLM Rerank

Get candidates quickly, refine with LLM.

async function smartSearch(query, documents, embeddings, llm) {
  // Fast semantic retrieval
  const candidates = await semanticSearch(query, embeddings);
  const top20 = candidates.slice(0, 20);

  // LLM rerank for precision
  const reranked = await llm.chat({
    messages: [
      {
        role: "user",
        content: `Rank by relevance to "${query}": ${JSON.stringify(top20)}`,
      },
    ],
  });

  return JSON.parse(reranked);
}

Comparison Matrix

Approach	Type	Latency	Accuracy	Understands Meaning	Reasoning
Keyword	Syntactic	~1ms	Low	❌	None
Fuzzy	Syntactic	~5ms	Low-Med	❌	None
Full-text	Syntactic	~10ms	Medium	❌ (stems only)	None
Embeddings	Semantic	~50ms	High	✅	None
LLM Rerank	Semantic	~500ms	High	✅	Light
RAG	Semantic	~2s	Very High	✅	Yes
Full LLM	Semantic	~3s	Highest	✅	Full

Runtime Options

Where the approach runs affects what's practical:

Approach	JS	WASM	Remote
Keyword	✅ Native	—	✅
Fuzzy	✅ Fuse.js	—	✅
Full-text	✅ Lunr	✅	✅ Elastic
Embeddings	—	✅ transformers.js	✅ OpenAI/Cohere
LLM	—	✅ WebLLM	✅ Claude/GPT

Legend: ✅ Viable | — Not applicable

Tradeoffs by Runtime

	JS	WASM	Remote
Cold Start	None	4s–30s+	None
Download	~50KB	30MB–2GB	0
Query Speed	<10ms	50ms–5s	200–500ms
Privacy	✅	✅	❌
Offline	✅	✅	❌
Model Quality	N/A	Good	Best

Choose WASM when: Privacy matters, offline needed, acceptable cold start.

Choose Remote when: Best model quality needed, no cold start tolerance, data isn't sensitive.

The demo above uses WASM for both semantic approaches so you can experience the cold start and query latency tradeoffs directly.

Choosing an Approach

flowchart TD
    A{"Query volume?"}:::primary -->|Very high| B[Keyword or Full-text]:::accent
    A -->|Normal| C{"Users know exact terminology?"}:::primary
    C -->|Yes| D[Full-text + Fuzzy]:::accent
    C -->|No| E{"Ask questions?"}:::primary
    E -->|Yes| F[RAG]:::accent
    E -->|No| G[Embeddings or Hybrid]:::accent

Key Takeaways

Start simple — Keyword search is often enough
Add complexity for clear gains — Each step has cost
Hybrid beats pure — Combine fast retrieval + smart ranking
Match approach to query type — Keywords vs questions need different tools
Location constrains options — Client-side limits what's practical

Series Navigation

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings — How vectors capture meaning
Intent Approaches (this post) — Seven ways to understand queries
Building Hybrid Intent — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

RAG Flow Control — Production RAG patterns
Knowledge Search Troubleshooting — Debugging semantic search
MCP Dynamic Data — When to make retrieval dynamic

Next: Part 4 shows how this blog implements hybrid intent handling.