Intent: From Keywords to LLMs
A technical comparison of intent understanding approaches—from simple keyword matching to LLM-powered reasoning. Each approach has trade-offs in speed, accuracy, complexity, and capability.
The Intent & Action series
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings — How vectors capture meaning
- Intent Approaches (this post) — Seven ways to understand queries
- Building Hybrid Intent — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
Try It: Live Comparison
Before diving into the technical details, try the different approaches yourself. For detailed performance analysis, see Search Performance: From Milliseconds to Seconds.
Key observations:
- Syntactic (Keyword/Fuzzy): Instant, no models, but can't understand meaning
- Vector Search: ~30MB model, understands semantic similarity
- LLM: Generates real answers — Local runs in browser, Remote uses OpenAI
- RAG: Best of both — vector retrieval, then LLM generates contextual answers
All local models run entirely in your browser via WebAssembly — no API keys needed!
Two Dimensions: Approach and Runtime
Before diving in, let's separate two independent concerns:
Approach — what algorithm understands intent:
- Syntactic: Keyword, Fuzzy, Full-text (string matching)
- Semantic: Embeddings, LLMs (meaning understanding)
Runtime — where it executes:
- JS: Pure JavaScript, instant, no download
- WASM: Compiled models running locally in browser
- Remote: API calls to servers
These dimensions are orthogonal:
| Approach | WASM (local) | Remote (API) |
|---|---|---|
| Embeddings | transformers.js | OpenAI, Cohere |
| LLM | WebLLM, llama.cpp | Claude, GPT, Gemini |
The demo above uses WASM for both Embeddings and LLM to show client-side performance characteristics. In production, you might use remote APIs for better models or WASM for privacy/offline.
The Approach Spectrum
Intent understanding spans a spectrum from simple to sophisticated:
flowchart LR
subgraph Syntactic
A[Keyword]:::secondary --> B[Fuzzy]:::secondary --> C[Full-text]:::secondary
end
subgraph Semantic
D[Embeddings]:::accent --> E[LLM Rerank]:::accent --> F[RAG]:::accent --> G[Full LLM]:::accent
end
C --> D
Faster, simpler, less capable ← → Slower, complex, more capable
Each step adds capability but also complexity and latency.
Pre-LLM Approaches
These approaches predate large language models (or don't require them).
1. Keyword Search
Does the query string appear in the content?
function keywordSearch(query, documents) {
const q = query.toLowerCase();
return documents.filter(
(doc) =>
doc.title.toLowerCase().includes(q) ||
doc.content.toLowerCase().includes(q),
);
}
| Aspect | Value |
|---|---|
| Latency | ~1ms |
| Accuracy | Low (vocabulary mismatch fails) |
| Complexity | Trivial |
| Resources | Minimal |
When to use: Small content sets, exact terminology, instant results required.
2. Fuzzy Matching
Handles typos via edit distance (Levenshtein, Jaro-Winkler).
import Fuse from "fuse.js";
const fuse = new Fuse(documents, {
keys: ["title", "content"],
threshold: 0.3, // 30% character difference allowed
distance: 100,
});
const results = fuse.search(query);
| Aspect | Value |
|---|---|
| Latency | ~5-10ms |
| Accuracy | Low-Medium |
| Complexity | Low |
| Resources | Minimal |
When to use: User input is error-prone, names have variant spellings.
3. Full-Text Search
Tokenization, stemming, TF-IDF or BM25 ranking.
import lunr from "lunr";
// Build index (once at startup)
const idx = lunr(function () {
this.ref("slug");
this.field("title", { boost: 10 });
this.field("content");
documents.forEach((doc) => this.add(doc));
});
// Search (per query)
const results = idx.search(query);
| Aspect | Value |
|---|---|
| Latency | ~10-20ms |
| Accuracy | Medium |
| Complexity | Medium |
| Resources | Index in memory |
When to use: Medium-sized content, need ranking, want phrase handling.
4. Embedding-Based Search
Vector representations for semantic matching. See Understanding Embeddings for details.
import { pipeline } from "@xenova/transformers";
const extractor = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
);
async function embed(text) {
const output = await extractor(text, {
pooling: "mean",
normalize: true,
});
return Array.from(output.data);
}
async function semanticSearch(query, embeddings) {
const queryVec = await embed(query);
return Object.entries(embeddings)
.map(([id, vec]) => ({
id,
score: cosineSimilarity(queryVec, vec),
}))
.sort((a, b) => b.score - a.score);
}
| Aspect | Value |
|---|---|
| Latency | ~50ms (warm), ~4s (cold) |
| Accuracy | High |
| Complexity | Medium-High |
| Resources | Model in memory (~40MB) |
When to use: Vocabulary mismatch is common, discovery over precision.
LLM Approaches
Large language models enable new capabilities beyond matching.
5. LLM Reranking
Use traditional retrieval, then rerank with an LLM.
async function rerankedSearch(query, documents, llm) {
// Fast retrieval first (keyword or embedding)
const candidates = keywordSearch(query, documents).slice(0, 20);
// LLM reranking
const response = await llm.chat({
messages: [
{
role: "user",
content: `Rank these by relevance to "${query}":
${candidates.map((d, i) => `${i + 1}. ${d.title}: ${d.excerpt}`).join("\n")}
Return JSON array of numbers (original positions) in relevance order.`,
},
],
});
const order = JSON.parse(response);
return order.map((i) => candidates[i - 1]);
}
| Aspect | Value |
|---|---|
| Latency | 500ms-1s |
| Accuracy | High |
| Complexity | Medium |
| Resources | LLM API or local model |
When to use: Want semantic understanding without full LLM latency on every query.
6. RAG (Retrieval-Augmented Generation)
Retrieve relevant context, then generate an answer.
async function ragSearch(query, documents, vectorDB, llm) {
// Retrieve context
const context = await vectorDB.search(query, { limit: 5 });
// Generate answer
const answer = await llm.chat({
messages: [
{
role: "system",
content: "Answer based only on the provided context.",
},
{
role: "user",
content: `Context:
${context.map((c) => c.content).join("\n\n")}
Question: ${query}`,
},
],
});
return { answer, sources: context };
}
| Aspect | Value |
|---|---|
| Latency | 1-3s |
| Accuracy | Very High |
| Complexity | High |
| Resources | Vector DB + LLM |
When to use: Users ask questions expecting answers, not links. See RAG Flow Control.
7. Full LLM Search
Pass everything to the model.
async function llmSearch(query, documents, llm) {
const response = await llm.chat({
messages: [
{
role: "user",
content: `Given these documents:
${JSON.stringify(
documents.map((d) => ({
title: d.title,
excerpt: d.excerpt,
slug: d.slug,
})),
)}
Find documents relevant to: "${query}"
Explain why each is relevant.
Return as JSON: [{ slug, reason }]`,
},
],
});
return JSON.parse(response);
}
| Aspect | Value |
|---|---|
| Latency | 2-5s |
| Accuracy | Highest |
| Complexity | High |
| Resources | LLM (context limits) |
When to use: Complex queries, need explanation, content fits in context window.
Hybrid Approaches
Production systems typically combine approaches:
Syntactic + Semantic
Run both, merge results.
async function hybridSearch(query, documents, embeddings) {
// Keyword (instant)
const keywordResults = keywordSearch(query, documents).map((d, i) => ({
...d,
keywordRank: i,
}));
// Semantic
const semanticResults = await semanticSearch(query, embeddings);
// Reciprocal Rank Fusion
const scores = {};
const k = 60;
keywordResults.forEach((r, i) => {
scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
});
semanticResults.forEach((r, i) => {
scores[r.id] = (scores[r.id] || 0) + 1 / (k + i);
});
return Object.entries(scores)
.sort(([, a], [, b]) => b - a)
.map(([id]) => documents.find((d) => d.id === id));
}
Fast Retrieval + LLM Rerank
Get candidates quickly, refine with LLM.
async function smartSearch(query, documents, embeddings, llm) {
// Fast semantic retrieval
const candidates = await semanticSearch(query, embeddings);
const top20 = candidates.slice(0, 20);
// LLM rerank for precision
const reranked = await llm.chat({
messages: [
{
role: "user",
content: `Rank by relevance to "${query}": ${JSON.stringify(top20)}`,
},
],
});
return JSON.parse(reranked);
}
Comparison Matrix
| Approach | Type | Latency | Accuracy | Understands Meaning | Reasoning |
|---|---|---|---|---|---|
| Keyword | Syntactic | ~1ms | Low | ❌ | None |
| Fuzzy | Syntactic | ~5ms | Low-Med | ❌ | None |
| Full-text | Syntactic | ~10ms | Medium | ❌ (stems only) | None |
| Embeddings | Semantic | ~50ms | High | ✅ | None |
| LLM Rerank | Semantic | ~500ms | High | ✅ | Light |
| RAG | Semantic | ~2s | Very High | ✅ | Yes |
| Full LLM | Semantic | ~3s | Highest | ✅ | Full |
Runtime Options
Where the approach runs affects what's practical:
| Approach | JS | WASM | Remote |
|---|---|---|---|
| Keyword | ✅ Native | — | ✅ |
| Fuzzy | ✅ Fuse.js | — | ✅ |
| Full-text | ✅ Lunr | ✅ | ✅ Elastic |
| Embeddings | — | ✅ transformers.js | ✅ OpenAI/Cohere |
| LLM | — | ✅ WebLLM | ✅ Claude/GPT |
Legend: ✅ Viable | — Not applicable
Tradeoffs by Runtime
| JS | WASM | Remote | |
|---|---|---|---|
| Cold Start | None | 4s–30s+ | None |
| Download | ~50KB | 30MB–2GB | 0 |
| Query Speed | <10ms | 50ms–5s | 200–500ms |
| Privacy | ✅ | ✅ | ❌ |
| Offline | ✅ | ✅ | ❌ |
| Model Quality | N/A | Good | Best |
Choose WASM when: Privacy matters, offline needed, acceptable cold start.
Choose Remote when: Best model quality needed, no cold start tolerance, data isn't sensitive.
The demo above uses WASM for both semantic approaches so you can experience the cold start and query latency tradeoffs directly.
Choosing an Approach
flowchart TD
A{"Query volume?"}:::primary -->|Very high| B[Keyword or Full-text]:::accent
A -->|Normal| C{"Users know exact terminology?"}:::primary
C -->|Yes| D[Full-text + Fuzzy]:::accent
C -->|No| E{"Ask questions?"}:::primary
E -->|Yes| F[RAG]:::accent
E -->|No| G[Embeddings or Hybrid]:::accent
Key Takeaways
- Start simple — Keyword search is often enough
- Add complexity for clear gains — Each step has cost
- Hybrid beats pure — Combine fast retrieval + smart ranking
- Match approach to query type — Keywords vs questions need different tools
- Location constrains options — Client-side limits what's practical
Series Navigation
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings — How vectors capture meaning
- Intent Approaches (this post) — Seven ways to understand queries
- Building Hybrid Intent — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
Related Posts
- RAG Flow Control — Production RAG patterns
- Knowledge Search Troubleshooting — Debugging semantic search
- MCP Dynamic Data — When to make retrieval dynamic
Next: Part 4 shows how this blog implements hybrid intent handling.