Your knowledge search returns garbage. Your RAG pipeline isn't finding the right documents. This troubleshooting guide covers the most common knowledge search problems in Ema workflows—and how to fix them.

How Knowledge Search Works

Knowledge search performs semantic search across your uploaded documents. You give it a query, it returns relevant chunks of text.

flowchart TD
    A["Query: 'Michael Thompson portfolio holdings'"]
    B["Knowledge Search<br/>Searches: client-knowledge-base.md,<br/>market-data.md, etc.<br/>Returns: Relevant chunks/paragraphs"]
    C["Search Results<br/>(chunks about Michael Thompson)"]

    A --> B --> C

Key insight: Search returns relevant segments, not entire documents.


Key Concept: Chunks, Not Whole Documents

Your search results are chunks—relevant segments from your documents:

Your document (1000 lines):

Section Content Search Result
Header # Client Directory
... 100 lines of other clients
Chunk 1 ## Michael Thompson (c_102) — Tier: HNW, Risk: Moderate, Holdings: NVDA 15.2%, AAPL 8.5%... Returned
Chunk 2 More about Michael... Returned
Other ## Sarah Johnson (c_103) ❌ Not returned

Search for "Michael Thompson" returns chunks 1 & 2, not the whole file.


Common Problems and Solutions

Problem 1: No Results Returned

Symptoms: Search returns empty or no matches.

Causes:

  • Query too specific
  • Wrong datastore configured
  • File filters too narrow

Solutions:

# Check 1: Broaden your query
# Before
query: "c_102 Q4 2025 NVDA exposure percentage"
# After
query: "Michael Thompson portfolio NVDA holdings"

# Check 2: Verify datastore
datastore_configs: ["fileUpload"]  # Is this the right datastore?

# Check 3: Remove or broaden file filters
# Before
file_name_filters: "client-michael-*.md"
# After
file_name_filters: "client-*.md"  # Or remove entirely

Problem 2: Wrong Results Returned

Symptoms: Search returns documents, but they're not relevant.

Causes:

  • Query too vague
  • Missing entity specifics
  • Semantic drift from noise

Solutions:

# Problem: Query is too vague
query: "client information"  # Returns random client data

# Solution: Include specific entities
query: "Michael Thompson c_102 portfolio holdings allocation"

Pattern: Entity-Aware Query Building

conversation
     │
     ↓
entity_extraction
{client: "Michael", ticker: "NVDA", request: "exposure"}
     │
     ↓
call_llm (query builder)
"Build query: {client} {ticker} {request} holdings weight"
     │
     ↓
"Michael Thompson NVDA exposure holdings weight portfolio"
     │
     ↓
knowledge_search → Relevant chunks!

Problem 3: Partial Data Returned

Symptoms: Gets some info but misses related data (e.g., gets holdings but not compliance status).

Causes:

  • Chunk boundaries split related content
  • Not enough context retrieved

Solutions:

# Increase context windows
knowledge_search:
  num_previous_segments: 3 # More context before chunk
  num_next_segments: 3 # More context after chunk
  max_extractive_segment_count: 10 # More chunks per doc

Better solution: Pre-join related data in your documents

## Michael Thompson (c_102)

Tier: HNW | Risk: Moderate

### Portfolio

| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA   | 15.2%  | $166K |

### Compliance

- TMD: Compliant
- Concentration: Warning on Tech (44.6%)

### Contact

- Email: m.t@corp.com
- Advisor: Steven Poitras

One search finds everything about Michael.


Problem 4: Conversation Noise Pollutes Query

Symptoms: Using raw chat_conversation as query returns poor results.

Example of what happens:

Search query (if using raw conversation):
"Hi there Hello! How can I help you today? I'm looking for info on
Michael Thompson Sure, let me look that up. Specifically his portfolio
performance I found some information... What about compliance status?"

This is not what you want to search for.

Solution: Use conversation_summarizer

# Clean approach
conversation_summarizer:
  input: trigger.chat_conversation

knowledge_search:
  query: conversation_summarizer.summarized_conversation
  # Gets: "Michael Thompson portfolio performance compliance status"

Problem 5: Missing Files in Search

Symptoms: Know the file exists but search doesn't find it.

Causes:

  • Wrong datastore
  • File not indexed yet
  • File filters excluding it

Solutions:

# Check datastore configuration
knowledge_search:
  datastore_configs:
    - "fileUpload"      # Main uploads
    - "templates"       # Template files
    - "house_views"     # Investment views

# Remove restrictive filters
# Before
file_name_filters: "portfolio-*.md"
# After
file_name_filters: null  # Search all files

Search Configuration Reference

Parameter Purpose Example
query What to search for "Michael Thompson portfolio"
datastore_configs Which knowledge bases ["fileUpload", "clients"]
page_size Max results 20
max_extractive_segment_count Chunks per doc 10
file_name_filters Limit to specific files "client-*.md"
num_previous_segments Context before chunk 2
num_next_segments Context after chunk 2

Multiple Knowledge Bases

You can search different data sources for different purposes:

Data Source Contains Use For
fileUpload (main) client-knowledge-base.md, market-data.md Client data, holdings, portfolios
templates Agent instructions, response templates Finding role/task instructions dynamically
house_views Investment house views, allocation targets Compliance checking, recommendations

Two Searches in Parallel

flowchart TD
    A["conversation_summarizer"]
    B["knowledge_search_1<br/>datastore: clients<br/>'Michael Thompson'"]
    C["knowledge_search_2<br/>datastore: templates<br/>'PORTFOLIO_REVIEW'"]
    D["unified_agent<br/>(gets both results)"]

    A --> B & C
    B & C --> D

Do You Need Document Synthesis?

document_synthesis combines scattered chunks into coherent content.

When You DON'T Need It

  • Your documents are already well-structured markdown
  • Chunks are self-contained (have headers, context)
  • Agent can work with raw chunks

When You DO Need It

  • Chunks are fragmented, need assembly
  • Want to transform format (e.g., chunks → structured JSON)
  • Need to deduplicate/reconcile conflicting info

Alternative: Let Agent Do Synthesis

agent:
  named_inputs:
    - Search_Results: knowledge_search.search_results

  instructions: |
    From the search results, extract and structure:
    - Client name and ID
    - Portfolio holdings
    - Compliance status

    The chunks may be scattered - synthesize into coherent data.

The agent performs synthesis as part of its task.


Debugging Checklist

When search isn't working, check in order:

Check Command/Action
1. Query quality Is the query specific? Does it include entity names/IDs?
2. Datastore config Is the right datastore listed in datastore_configs?
3. File filters Are filters excluding the file you need?
4. Document structure Is related data pre-joined or scattered?
5. Context settings Are num_previous/next_segments sufficient?
6. Conversation noise Are you using conversation_summarizer?

Optimizing Search Performance

1. Pre-join Related Data

Instead of searching for client, then holdings, then compliance separately:

## Michael Thompson (c_102)

Tier: HNW | Risk: Moderate

### Portfolio

| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA   | 15.2%  | $166K |

### Compliance

- TMD: Compliant
- Concentration: Warning on Tech (44.6%)

2. Use Specific Queries

# ❌ Vague
query: "client information"

# ✅ Specific
query: "Michael Thompson c_102 portfolio holdings compliance"

3. Filter by File When Possible

knowledge_search:
  file_name_filters: "client-*.md" # Only search client files

4. Increase Context Windows

knowledge_search:
  num_previous_segments: 3
  num_next_segments: 3

Common RAG Patterns

Pattern 1: Simple RAG

conversation → summarizer → search → agent → response

Best for: Simple Q&A with knowledge base.

Pattern 2: Entity-Aware RAG

conversation → entity_extraction → query_builder → search → agent

Best for: When you need specific entities in the query.

Pattern 3: Multi-Source RAG

conversation → summarizer → [search_1, search_2] → agent (both results)

Best for: Different data types (client data + templates).

Pattern 4: Iterative RAG (Agent with Tools)

agent (has search tool) → searches as needed → synthesizes → response

Best for: Complex queries where initial search might not be enough.


Quick Diagnostic Table

Problem Likely Cause Quick Fix
No results Query too specific Broaden query, check datastore
Wrong results Query too vague Add entity names, IDs
Partial data Chunk boundaries Increase segment context
Missing files Wrong datastore Check datastore_configs
Noisy results Raw conversation as query Use conversation_summarizer

Summary

Question Answer
What does search return? Relevant chunks, not whole docs
How to get good results? Entity-aware queries with specific terms
Multiple data sources? Use multiple searches or datastores
Need document_synthesis? Often no—agents can synthesize
Optimize performance? Pre-join data, specific queries, filters

The #1 search quality fix: Include specific entity names and IDs in your query. The difference between "client information" and "Michael Thompson c_102 portfolio" is night and day.


For more on building search queries from conversations, see Ema Workflows: Mastering RAG Flow Control.