Building This Blog's Search
A walkthrough of how this blog implements intent handling—what's built, what's working, and what's next. Spoiler: we're not as far along as the other posts might suggest.
The Intent & Action series
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings — How vectors capture meaning
- Intent Approaches — Seven ways to understand queries
- Building Hybrid Intent (this post) — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
Current State: Honest Assessment
Here's where this blog's search actually is:
| Component | Status | Notes |
|---|---|---|
| Keyword search | ✅ Working | Instant results as you type |
| Build-time embeddings | ✅ Generated | 18 posts × 384 dimensions |
| Semantic search | ✅ Integrated | Loads in background, enhances results |
| LLM features | ❌ Not implemented | Future roadmap |
The search is now hybrid. Keyword results appear instantly. Semantic model loads in background (3s on homepage, or on first search). Once loaded, results are enhanced with semantic matches marked with ✨.
What's Working
1. Keyboard-Triggered Search Overlay
Press ⌘K (Mac) or Ctrl+K (Windows) anywhere on the site.
document.addEventListener("keydown", (e) => {
if ((e.metaKey || e.ctrlKey) && e.key === "k") {
e.preventDefault();
openSearch();
}
});
2. Keyword Filtering
Simple but fast—filters posts by title and excerpt.
function search(query) {
const q = query.toLowerCase();
return posts.filter(
(post) =>
post.title.toLowerCase().includes(q) ||
post.excerpt.toLowerCase().includes(q),
);
}
This finds "MCP" in posts with "MCP" in the title. It won't find "Model Context Protocol" when you search "MCP" unless both terms appear.
3. Shareable URLs
Every search updates the URL with ?q=yourquery.
function updateURL(query) {
const url = new URL(window.location);
if (query.trim()) {
url.searchParams.set("q", query.trim());
} else {
url.searchParams.delete("q");
}
window.history.replaceState({}, "", url);
}
Share ?q=mcp and recipients see MCP-filtered results.
4. Copy Link Shortcut
⌘C when search is open copies the shareable URL.
if (
(e.metaKey || e.ctrlKey) &&
e.key === "c" &&
overlay.classList.contains("open")
) {
const hasResults = results.querySelector(".search-result");
const hasSelection = window.getSelection().toString().length > 0;
if (hasResults && !hasSelection && input.value.trim()) {
e.preventDefault();
copyShareableLink();
}
}
What's Working
Build-Time Embeddings
Every build generates embeddings for all posts:
// scripts/generate-embeddings.js
import { pipeline } from "@xenova/transformers";
const extractor = await pipeline(
"feature-extraction",
"Xenova/all-MiniLM-L6-v2",
);
for (const post of posts) {
const text = [
post.title,
post.tags?.join(" "),
post.content.slice(0, 1000),
].join(" ");
const output = await extractor(text, {
pooling: "mean",
normalize: true,
});
embeddings[post.slug] = Array.from(output.data);
}
writeFileSync("src/_data/embeddings.json", JSON.stringify(embeddings));
Output: embeddings.json with 384-dimensional vectors for each post.
Hybrid Search in main.js
The search now combines keyword and semantic:
// Instant keyword results
let matches = keywordSearch(query).slice(0, 8);
renderResults(matches, q);
// If semantic is ready, enhance results
if (isSemanticReady) {
const semanticResults = await semanticSearch(query);
// Merge: semantic results that aren't already in keyword results
const keywordUrls = new Set(matches.map((m) => m.url));
const newSemantic = semanticResults.filter((s) => !keywordUrls.has(s.url));
// Interleave: keyword first, then semantic additions
if (newSemantic.length > 0) {
matches = [...matches.slice(0, 4), ...newSemantic.slice(0, 4)];
}
renderResults(matches.slice(0, 8), q, true);
}
Semantic loads in background 3 seconds after page load (on homepage) or when user first searches.
The Target Architecture
What we're building toward:
flowchart TB
subgraph BUILD["BUILD TIME"]
B1["1. Parse markdown posts"]:::secondary
B2["2. Generate embeddings"]:::secondary
B3["3. Save as embeddings.json"]:::secondary
B4["4. Deploy as static asset"]:::accent
B1 --> B2 --> B3 --> B4
end
B4 --> RT
subgraph RT["RUNTIME"]
direction TB
U["User presses Cmd+K"]:::primary
U --> K["Keyword filter (instant)"]:::secondary
U --> S["Lazy-load embedding model"]:::secondary
K --> KR["Show results immediately"]:::accent
S --> S1["First time: download WASM"]:::secondary
S1 --> S2["Embed query (~50ms)"]:::secondary
S2 --> S3["Cosine similarity"]:::secondary
S3 --> S4["Re-rank results"]:::secondary
S4 --> F["Update with semantic ranking"]:::accent
F --> FB["Show 'Semantic' badge"]:::accent
end
Key design decisions:
- Keyword first (instant feedback)
- Semantic enhancement (lazy-loaded, optional)
- No server required (static hosting)
- Progressive enhancement (works without JS)
Design Decisions
Why this architecture?
- Keyword first — Users get instant results while semantic loads
- Background loading — Model downloads after 3s on homepage, non-blocking
- Progressive enhancement — Works without semantic; better with it
- No spinner blocking — UI never waits for semantic; it enhances when ready
- Persistence — Once model is cached, subsequent searches are fast (~50ms)
What's Next
Phase 1: Done ✅
Hybrid search is working:
- Keyword instant
- Semantic enhances when ready
- Results marked with ✨ for semantic matches
Phase 2: Search Mode Toggle
Let users choose explicitly:
[Keyword] [Semantic] [Hybrid]
With performance indicators showing latency for each. See the live demo in Part 3.
Phase 3: Natural Language Queries
For queries that look like questions:
function isQuestion(query) {
return /^(what|how|why|when|where|who|can|does|is)/i.test(query);
}
if (isQuestion(query)) {
// Use RAG-style answer generation (future)
}
Try It
Press ⌘K now. What you'll experience:
- Instant: Keyword results appear immediately
- Enhanced: After ~3s, semantic model loads in background
- Hybrid: Results include both exact matches and semantic discoveries (marked ✨)
- Shareable: URL updates,
⌘Ccopies link
Try "workflow automation" — keyword finds exact matches, semantic adds orchestration-related posts.
Performance (Current)
| Metric | Value |
|---|---|
| Time to first result | ~2ms |
| Memory overhead | ~1MB (post data) |
| Bundle impact | Minimal |
| Offline support | Yes (after first load) |
Performance (Target with Semantic)
| Metric | Keyword | Semantic (warm) | Semantic (cold) |
|---|---|---|---|
| First result | ~2ms | ~50ms | ~4000ms |
| Memory | ~1MB | ~40MB | ~40MB |
| Bundle | 0 | Lazy (~30MB) | Lazy (~30MB) |
Source Code
All the code discussed:
- Search UI:
src/assets/js/main.js→initSearch() - Embedding generation:
scripts/generate-embeddings.js - Embeddings data:
src/_data/embeddings.json - Semantic module:
src/assets/js/semantic-search.js(not integrated)
Key Takeaways
- Ship what works — Keyword search is live and useful
- Build foundations — Embeddings are ready when needed
- Be honest about state — Don't claim features that aren't integrated
- Progressive enhancement — Add semantic when the benefit is clear
- Document the journey — The roadmap is as valuable as the destination
Series Navigation
- Search is Dead. Long Live Action. — From retrieval to outcomes
- Understanding Embeddings — How vectors capture meaning
- Intent Approaches — Seven ways to understand queries
- Building Hybrid Intent (this post) — Keyword + semantic in practice
- Search Performance — What 30ms vs 3s actually costs
Related Posts
- RAG Flow Control — Production intent handling
- MCP Dynamic Data — When to make retrieval dynamic
Status: Hybrid search live. Keyword instant, semantic loads in background. Try it with ⌘K.