Ema Workflows: Search Debugging

Your knowledge search returns garbage. Your RAG pipeline isn't finding the right documents. This troubleshooting guide covers the most common knowledge search problems in Ema workflows—and how to fix them.

How Knowledge Search Works

Knowledge search performs semantic search across your uploaded documents. You give it a query, it returns relevant chunks of text.

flowchart TD
    A["Query: 'Michael Thompson portfolio holdings'"]
    B["Knowledge Search<br/>Searches: client-knowledge-base.md,<br/>market-data.md, etc.<br/>Returns: Relevant chunks/paragraphs"]
    C["Search Results<br/>(chunks about Michael Thompson)"]

    A --> B --> C

Key insight: Search returns relevant segments, not entire documents.

Key Concept: Chunks, Not Whole Documents

Your search results are chunks—relevant segments from your documents:

Your document (1000 lines):

Section	Content	Search Result
Header	`# Client Directory`	—
...	100 lines of other clients	—
Chunk 1	`## Michael Thompson (c_102)` — Tier: HNW, Risk: Moderate, Holdings: NVDA 15.2%, AAPL 8.5%...	✅ Returned
Chunk 2	More about Michael...	✅ Returned
Other	`## Sarah Johnson (c_103)`	❌ Not returned

Search for "Michael Thompson" returns chunks 1 & 2, not the whole file.

Common Problems and Solutions

Problem 1: No Results Returned

Symptoms: Search returns empty or no matches.

Causes:

Query too specific
Wrong datastore configured
File filters too narrow

Solutions:

# Check 1: Broaden your query
# Before
query: "c_102 Q4 2025 NVDA exposure percentage"
# After
query: "Michael Thompson portfolio NVDA holdings"

# Check 2: Verify datastore
datastore_configs: ["fileUpload"]  # Is this the right datastore?

# Check 3: Remove or broaden file filters
# Before
file_name_filters: "client-michael-*.md"
# After
file_name_filters: "client-*.md"  # Or remove entirely

Problem 2: Wrong Results Returned

Symptoms: Search returns documents, but they're not relevant.

Causes:

Query too vague
Missing entity specifics
Semantic drift from noise

Solutions:

# Problem: Query is too vague
query: "client information"  # Returns random client data

# Solution: Include specific entities
query: "Michael Thompson c_102 portfolio holdings allocation"

Pattern: Entity-Aware Query Building

conversation
     │
     ↓
entity_extraction
{client: "Michael", ticker: "NVDA", request: "exposure"}
     │
     ↓
call_llm (query builder)
"Build query: {client} {ticker} {request} holdings weight"
     │
     ↓
"Michael Thompson NVDA exposure holdings weight portfolio"
     │
     ↓
knowledge_search → Relevant chunks!

Problem 3: Partial Data Returned

Symptoms: Gets some info but misses related data (e.g., gets holdings but not compliance status).

Causes:

Chunk boundaries split related content
Not enough context retrieved

Solutions:

# Increase context windows
knowledge_search:
  num_previous_segments: 3 # More context before chunk
  num_next_segments: 3 # More context after chunk
  max_extractive_segment_count: 10 # More chunks per doc

Better solution: Pre-join related data in your documents

## Michael Thompson (c_102)

Tier: HNW | Risk: Moderate

### Portfolio

| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA   | 15.2%  | $166K |

### Compliance

- TMD: Compliant
- Concentration: Warning on Tech (44.6%)

### Contact

- Email: m.t@corp.com
- Advisor: Steven Poitras

One search finds everything about Michael.

Problem 4: Conversation Noise Pollutes Query

Symptoms: Using raw chat_conversation as query returns poor results.

Example of what happens:

Search query (if using raw conversation):
"Hi there Hello! How can I help you today? I'm looking for info on
Michael Thompson Sure, let me look that up. Specifically his portfolio
performance I found some information... What about compliance status?"

This is not what you want to search for.

Solution: Use conversation_summarizer

# Clean approach
conversation_summarizer:
  input: trigger.chat_conversation

knowledge_search:
  query: conversation_summarizer.summarized_conversation
  # Gets: "Michael Thompson portfolio performance compliance status"

Problem 5: Missing Files in Search

Symptoms: Know the file exists but search doesn't find it.

Causes:

Wrong datastore
File not indexed yet
File filters excluding it

Solutions:

# Check datastore configuration
knowledge_search:
  datastore_configs:
    - "fileUpload"      # Main uploads
    - "templates"       # Template files
    - "house_views"     # Investment views

# Remove restrictive filters
# Before
file_name_filters: "portfolio-*.md"
# After
file_name_filters: null  # Search all files

Search Configuration Reference

Parameter	Purpose	Example
`query`	What to search for	"Michael Thompson portfolio"
`datastore_configs`	Which knowledge bases	`["fileUpload", "clients"]`
`page_size`	Max results	20
`max_extractive_segment_count`	Chunks per doc	10
`file_name_filters`	Limit to specific files	"client-*.md"
`num_previous_segments`	Context before chunk	2
`num_next_segments`	Context after chunk	2

Multiple Knowledge Bases

You can search different data sources for different purposes:

Data Source	Contains	Use For
`fileUpload` (main)	client-knowledge-base.md, market-data.md	Client data, holdings, portfolios
`templates`	Agent instructions, response templates	Finding role/task instructions dynamically
`house_views`	Investment house views, allocation targets	Compliance checking, recommendations

Two Searches in Parallel

flowchart TD
    A["conversation_summarizer"]
    B["knowledge_search_1<br/>datastore: clients<br/>'Michael Thompson'"]
    C["knowledge_search_2<br/>datastore: templates<br/>'PORTFOLIO_REVIEW'"]
    D["unified_agent<br/>(gets both results)"]

    A --> B & C
    B & C --> D

Do You Need Document Synthesis?

document_synthesis combines scattered chunks into coherent content.

When You DON'T Need It

Your documents are already well-structured markdown
Chunks are self-contained (have headers, context)
Agent can work with raw chunks

When You DO Need It

Chunks are fragmented, need assembly
Want to transform format (e.g., chunks → structured JSON)
Need to deduplicate/reconcile conflicting info

Alternative: Let Agent Do Synthesis

agent:
  named_inputs:
    - Search_Results: knowledge_search.search_results

  instructions: |
    From the search results, extract and structure:
    - Client name and ID
    - Portfolio holdings
    - Compliance status

    The chunks may be scattered - synthesize into coherent data.

The agent performs synthesis as part of its task.

Debugging Checklist

When search isn't working, check in order:

Check	Command/Action
1. Query quality	Is the query specific? Does it include entity names/IDs?
2. Datastore config	Is the right datastore listed in `datastore_configs`?
3. File filters	Are filters excluding the file you need?
4. Document structure	Is related data pre-joined or scattered?
5. Context settings	Are `num_previous/next_segments` sufficient?
6. Conversation noise	Are you using `conversation_summarizer`?

Optimizing Search Performance

1. Pre-join Related Data

Instead of searching for client, then holdings, then compliance separately:

## Michael Thompson (c_102)

Tier: HNW | Risk: Moderate

### Portfolio

| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA   | 15.2%  | $166K |

### Compliance

- TMD: Compliant
- Concentration: Warning on Tech (44.6%)

2. Use Specific Queries

# ❌ Vague
query: "client information"

# ✅ Specific
query: "Michael Thompson c_102 portfolio holdings compliance"

3. Filter by File When Possible

knowledge_search:
  file_name_filters: "client-*.md" # Only search client files

4. Increase Context Windows

knowledge_search:
  num_previous_segments: 3
  num_next_segments: 3

Common RAG Patterns

Pattern 1: Simple RAG

conversation → summarizer → search → agent → response

Best for: Simple Q&A with knowledge base.

Pattern 2: Entity-Aware RAG

conversation → entity_extraction → query_builder → search → agent

Best for: When you need specific entities in the query.

Pattern 3: Multi-Source RAG

conversation → summarizer → [search_1, search_2] → agent (both results)

Best for: Different data types (client data + templates).

Pattern 4: Iterative RAG (Agent with Tools)

agent (has search tool) → searches as needed → synthesizes → response

Best for: Complex queries where initial search might not be enough.

Quick Diagnostic Table

Problem	Likely Cause	Quick Fix
No results	Query too specific	Broaden query, check datastore
Wrong results	Query too vague	Add entity names, IDs
Partial data	Chunk boundaries	Increase segment context
Missing files	Wrong datastore	Check `datastore_configs`
Noisy results	Raw conversation as query	Use `conversation_summarizer`

Summary

Question	Answer
What does search return?	Relevant chunks, not whole docs
How to get good results?	Entity-aware queries with specific terms
Multiple data sources?	Use multiple searches or datastores
Need document_synthesis?	Often no—agents can synthesize
Optimize performance?	Pre-join data, specific queries, filters

The #1 search quality fix: Include specific entity names and IDs in your query. The difference between "client information" and "Michael Thompson c_102 portfolio" is night and day.

For more on building search queries from conversations, see Ema Workflows: Mastering RAG Flow Control.

How Knowledge Search Works

Key Concept: Chunks, Not Whole Documents

Common Problems and Solutions

Problem 1: No Results Returned

Problem 2: Wrong Results Returned

Problem 3: Partial Data Returned

Problem 4: Conversation Noise Pollutes Query

Problem 5: Missing Files in Search

Search Configuration Reference

Multiple Knowledge Bases

Two Searches in Parallel

Do You Need Document Synthesis?

When You DON'T Need It

When You DO Need It

Alternative: Let Agent Do Synthesis

Debugging Checklist

Optimizing Search Performance

1. Pre-join Related Data

2. Use Specific Queries

3. Filter by File When Possible

4. Increase Context Windows

Common RAG Patterns

Pattern 1: Simple RAG

Pattern 2: Entity-Aware RAG

Pattern 3: Multi-Source RAG

Pattern 4: Iterative RAG (Agent with Tools)

Quick Diagnostic Table

Summary

Content Calendar