Building This Blog's Search

A walkthrough of how this blog implements intent handling—what's built, what's working, and what's next. Spoiler: we're not as far along as the other posts might suggest.

The Intent & Action series

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings — How vectors capture meaning
Intent Approaches — Seven ways to understand queries
Building Hybrid Intent (this post) — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

Current State: Honest Assessment

Here's where this blog's search actually is:

Component	Status	Notes
Keyword search	✅ Working	Instant results as you type
Build-time embeddings	✅ Generated	18 posts × 384 dimensions
Semantic search	✅ Integrated	Loads in background, enhances results
LLM features	❌ Not implemented	Future roadmap

The search is now hybrid. Keyword results appear instantly. Semantic model loads in background (3s on homepage, or on first search). Once loaded, results are enhanced with semantic matches marked with ✨.

What's Working

1. Keyboard-Triggered Search Overlay

Press ⌘K (Mac) or Ctrl+K (Windows) anywhere on the site.

document.addEventListener("keydown", (e) => {
  if ((e.metaKey || e.ctrlKey) && e.key === "k") {
    e.preventDefault();
    openSearch();
  }
});

2. Keyword Filtering

Simple but fast—filters posts by title and excerpt.

function search(query) {
  const q = query.toLowerCase();
  return posts.filter(
    (post) =>
      post.title.toLowerCase().includes(q) ||
      post.excerpt.toLowerCase().includes(q),
  );
}

This finds "MCP" in posts with "MCP" in the title. It won't find "Model Context Protocol" when you search "MCP" unless both terms appear.

3. Shareable URLs

Every search updates the URL with ?q=yourquery.

function updateURL(query) {
  const url = new URL(window.location);
  if (query.trim()) {
    url.searchParams.set("q", query.trim());
  } else {
    url.searchParams.delete("q");
  }
  window.history.replaceState({}, "", url);
}

Share ?q=mcp and recipients see MCP-filtered results.

4. Copy Link Shortcut

⌘C when search is open copies the shareable URL.

if (
  (e.metaKey || e.ctrlKey) &&
  e.key === "c" &&
  overlay.classList.contains("open")
) {
  const hasResults = results.querySelector(".search-result");
  const hasSelection = window.getSelection().toString().length > 0;

  if (hasResults && !hasSelection && input.value.trim()) {
    e.preventDefault();
    copyShareableLink();
  }
}

What's Working

Build-Time Embeddings

Every build generates embeddings for all posts:

// scripts/generate-embeddings.js
import { pipeline } from "@xenova/transformers";

const extractor = await pipeline(
  "feature-extraction",
  "Xenova/all-MiniLM-L6-v2",
);

for (const post of posts) {
  const text = [
    post.title,
    post.tags?.join(" "),
    post.content.slice(0, 1000),
  ].join(" ");

  const output = await extractor(text, {
    pooling: "mean",
    normalize: true,
  });

  embeddings[post.slug] = Array.from(output.data);
}

writeFileSync("src/_data/embeddings.json", JSON.stringify(embeddings));

Output: embeddings.json with 384-dimensional vectors for each post.

Hybrid Search in main.js

The search now combines keyword and semantic:

// Instant keyword results
let matches = keywordSearch(query).slice(0, 8);
renderResults(matches, q);

// If semantic is ready, enhance results
if (isSemanticReady) {
  const semanticResults = await semanticSearch(query);

  // Merge: semantic results that aren't already in keyword results
  const keywordUrls = new Set(matches.map((m) => m.url));
  const newSemantic = semanticResults.filter((s) => !keywordUrls.has(s.url));

  // Interleave: keyword first, then semantic additions
  if (newSemantic.length > 0) {
    matches = [...matches.slice(0, 4), ...newSemantic.slice(0, 4)];
  }

  renderResults(matches.slice(0, 8), q, true);
}

Semantic loads in background 3 seconds after page load (on homepage) or when user first searches.

The Target Architecture

What we're building toward:

flowchart TB
    subgraph BUILD["BUILD TIME"]
        B1["1. Parse markdown posts"]:::secondary
        B2["2. Generate embeddings"]:::secondary
        B3["3. Save as embeddings.json"]:::secondary
        B4["4. Deploy as static asset"]:::accent
        B1 --> B2 --> B3 --> B4
    end

    B4 --> RT

    subgraph RT["RUNTIME"]
        direction TB
        U["User presses Cmd+K"]:::primary
        U --> K["Keyword filter (instant)"]:::secondary
        U --> S["Lazy-load embedding model"]:::secondary

        K --> KR["Show results immediately"]:::accent

        S --> S1["First time: download WASM"]:::secondary
        S1 --> S2["Embed query (~50ms)"]:::secondary
        S2 --> S3["Cosine similarity"]:::secondary
        S3 --> S4["Re-rank results"]:::secondary

        S4 --> F["Update with semantic ranking"]:::accent
        F --> FB["Show 'Semantic' badge"]:::accent
    end

Key design decisions:

Keyword first (instant feedback)
Semantic enhancement (lazy-loaded, optional)
No server required (static hosting)
Progressive enhancement (works without JS)

Design Decisions

Why this architecture?

Keyword first — Users get instant results while semantic loads
Background loading — Model downloads after 3s on homepage, non-blocking
Progressive enhancement — Works without semantic; better with it
No spinner blocking — UI never waits for semantic; it enhances when ready
Persistence — Once model is cached, subsequent searches are fast (~50ms)

What's Next

Phase 1: Done ✅

Hybrid search is working:

Keyword instant
Semantic enhances when ready
Results marked with ✨ for semantic matches

Phase 2: Search Mode Toggle

Let users choose explicitly:

[Keyword] [Semantic] [Hybrid]

With performance indicators showing latency for each. See the live demo in Part 3.

Phase 3: Natural Language Queries

For queries that look like questions:

function isQuestion(query) {
  return /^(what|how|why|when|where|who|can|does|is)/i.test(query);
}

if (isQuestion(query)) {
  // Use RAG-style answer generation (future)
}

Try It

Press ⌘K now. What you'll experience:

Instant: Keyword results appear immediately
Enhanced: After ~3s, semantic model loads in background
Hybrid: Results include both exact matches and semantic discoveries (marked ✨)
Shareable: URL updates, ⌘C copies link

Try "workflow automation" — keyword finds exact matches, semantic adds orchestration-related posts.

Performance (Current)

Metric	Value
Time to first result	~2ms
Memory overhead	~1MB (post data)
Bundle impact	Minimal
Offline support	Yes (after first load)

Performance (Target with Semantic)

Metric	Keyword	Semantic (warm)	Semantic (cold)
First result	~2ms	~50ms	~4000ms
Memory	~1MB	~40MB	~40MB
Bundle	0	Lazy (~30MB)	Lazy (~30MB)

Source Code

All the code discussed:

Search UI: src/assets/js/main.js → initSearch()
Embedding generation: scripts/generate-embeddings.js
Embeddings data: src/_data/embeddings.json
Semantic module: src/assets/js/semantic-search.js (not integrated)

Key Takeaways

Ship what works — Keyword search is live and useful
Build foundations — Embeddings are ready when needed
Be honest about state — Don't claim features that aren't integrated
Progressive enhancement — Add semantic when the benefit is clear
Document the journey — The roadmap is as valuable as the destination

Series Navigation

Search is Dead. Long Live Action. — From retrieval to outcomes
Understanding Embeddings — How vectors capture meaning
Intent Approaches — Seven ways to understand queries
Building Hybrid Intent (this post) — Keyword + semantic in practice
Search Performance — What 30ms vs 3s actually costs

RAG Flow Control — Production intent handling
MCP Dynamic Data — When to make retrieval dynamic

Status: Hybrid search live. Keyword instant, semantic loads in background. Try it with ⌘K.

Building This Blog's Search

Current State: Honest Assessment

What's Working

1. Keyboard-Triggered Search Overlay

2. Keyword Filtering

3. Shareable URLs

4. Copy Link Shortcut

What's Working

Build-Time Embeddings

Hybrid Search in main.js

The Target Architecture

Design Decisions

What's Next

Phase 1: Done ✅

Phase 2: Search Mode Toggle

Phase 3: Natural Language Queries

Try It

Performance (Current)

Performance (Target with Semantic)

Source Code

Key Takeaways

Series Navigation

Related Posts

Content Calendar