Ema Workflows: Search Debugging
Your knowledge search returns garbage. Your RAG pipeline isn't finding the right documents. This troubleshooting guide covers the most common knowledge search problems in Ema workflows—and how to fix them.
How Knowledge Search Works
Knowledge search performs semantic search across your uploaded documents. You give it a query, it returns relevant chunks of text.
flowchart TD
A["Query: 'Michael Thompson portfolio holdings'"]
B["Knowledge Search<br/>Searches: client-knowledge-base.md,<br/>market-data.md, etc.<br/>Returns: Relevant chunks/paragraphs"]
C["Search Results<br/>(chunks about Michael Thompson)"]
A --> B --> C
Key insight: Search returns relevant segments, not entire documents.
Key Concept: Chunks, Not Whole Documents
Your search results are chunks—relevant segments from your documents:
Your document (1000 lines):
| Section | Content | Search Result |
|---|---|---|
| Header | # Client Directory |
— |
| ... | 100 lines of other clients | — |
| Chunk 1 | ## Michael Thompson (c_102) — Tier: HNW, Risk: Moderate, Holdings: NVDA 15.2%, AAPL 8.5%... |
✅ Returned |
| Chunk 2 | More about Michael... | ✅ Returned |
| Other | ## Sarah Johnson (c_103) |
❌ Not returned |
Search for "Michael Thompson" returns chunks 1 & 2, not the whole file.
Common Problems and Solutions
Problem 1: No Results Returned
Symptoms: Search returns empty or no matches.
Causes:
- Query too specific
- Wrong datastore configured
- File filters too narrow
Solutions:
# Check 1: Broaden your query
# Before
query: "c_102 Q4 2025 NVDA exposure percentage"
# After
query: "Michael Thompson portfolio NVDA holdings"
# Check 2: Verify datastore
datastore_configs: ["fileUpload"] # Is this the right datastore?
# Check 3: Remove or broaden file filters
# Before
file_name_filters: "client-michael-*.md"
# After
file_name_filters: "client-*.md" # Or remove entirely
Problem 2: Wrong Results Returned
Symptoms: Search returns documents, but they're not relevant.
Causes:
- Query too vague
- Missing entity specifics
- Semantic drift from noise
Solutions:
# Problem: Query is too vague
query: "client information" # Returns random client data
# Solution: Include specific entities
query: "Michael Thompson c_102 portfolio holdings allocation"
Pattern: Entity-Aware Query Building
conversation
│
↓
entity_extraction
{client: "Michael", ticker: "NVDA", request: "exposure"}
│
↓
call_llm (query builder)
"Build query: {client} {ticker} {request} holdings weight"
│
↓
"Michael Thompson NVDA exposure holdings weight portfolio"
│
↓
knowledge_search → Relevant chunks!
Problem 3: Partial Data Returned
Symptoms: Gets some info but misses related data (e.g., gets holdings but not compliance status).
Causes:
- Chunk boundaries split related content
- Not enough context retrieved
Solutions:
# Increase context windows
knowledge_search:
num_previous_segments: 3 # More context before chunk
num_next_segments: 3 # More context after chunk
max_extractive_segment_count: 10 # More chunks per doc
Better solution: Pre-join related data in your documents
## Michael Thompson (c_102)
Tier: HNW | Risk: Moderate
### Portfolio
| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA | 15.2% | $166K |
### Compliance
- TMD: Compliant
- Concentration: Warning on Tech (44.6%)
### Contact
- Email: m.t@corp.com
- Advisor: Steven Poitras
One search finds everything about Michael.
Problem 4: Conversation Noise Pollutes Query
Symptoms: Using raw chat_conversation as query returns poor results.
Example of what happens:
Search query (if using raw conversation):
"Hi there Hello! How can I help you today? I'm looking for info on
Michael Thompson Sure, let me look that up. Specifically his portfolio
performance I found some information... What about compliance status?"
This is not what you want to search for.
Solution: Use conversation_summarizer
# Clean approach
conversation_summarizer:
input: trigger.chat_conversation
knowledge_search:
query: conversation_summarizer.summarized_conversation
# Gets: "Michael Thompson portfolio performance compliance status"
Problem 5: Missing Files in Search
Symptoms: Know the file exists but search doesn't find it.
Causes:
- Wrong datastore
- File not indexed yet
- File filters excluding it
Solutions:
# Check datastore configuration
knowledge_search:
datastore_configs:
- "fileUpload" # Main uploads
- "templates" # Template files
- "house_views" # Investment views
# Remove restrictive filters
# Before
file_name_filters: "portfolio-*.md"
# After
file_name_filters: null # Search all files
Search Configuration Reference
| Parameter | Purpose | Example |
|---|---|---|
query |
What to search for | "Michael Thompson portfolio" |
datastore_configs |
Which knowledge bases | ["fileUpload", "clients"] |
page_size |
Max results | 20 |
max_extractive_segment_count |
Chunks per doc | 10 |
file_name_filters |
Limit to specific files | "client-*.md" |
num_previous_segments |
Context before chunk | 2 |
num_next_segments |
Context after chunk | 2 |
Multiple Knowledge Bases
You can search different data sources for different purposes:
| Data Source | Contains | Use For |
|---|---|---|
fileUpload (main) |
client-knowledge-base.md, market-data.md | Client data, holdings, portfolios |
templates |
Agent instructions, response templates | Finding role/task instructions dynamically |
house_views |
Investment house views, allocation targets | Compliance checking, recommendations |
Two Searches in Parallel
flowchart TD
A["conversation_summarizer"]
B["knowledge_search_1<br/>datastore: clients<br/>'Michael Thompson'"]
C["knowledge_search_2<br/>datastore: templates<br/>'PORTFOLIO_REVIEW'"]
D["unified_agent<br/>(gets both results)"]
A --> B & C
B & C --> D
Do You Need Document Synthesis?
document_synthesis combines scattered chunks into coherent content.
When You DON'T Need It
- Your documents are already well-structured markdown
- Chunks are self-contained (have headers, context)
- Agent can work with raw chunks
When You DO Need It
- Chunks are fragmented, need assembly
- Want to transform format (e.g., chunks → structured JSON)
- Need to deduplicate/reconcile conflicting info
Alternative: Let Agent Do Synthesis
agent:
named_inputs:
- Search_Results: knowledge_search.search_results
instructions: |
From the search results, extract and structure:
- Client name and ID
- Portfolio holdings
- Compliance status
The chunks may be scattered - synthesize into coherent data.
The agent performs synthesis as part of its task.
Debugging Checklist
When search isn't working, check in order:
| Check | Command/Action |
|---|---|
| 1. Query quality | Is the query specific? Does it include entity names/IDs? |
| 2. Datastore config | Is the right datastore listed in datastore_configs? |
| 3. File filters | Are filters excluding the file you need? |
| 4. Document structure | Is related data pre-joined or scattered? |
| 5. Context settings | Are num_previous/next_segments sufficient? |
| 6. Conversation noise | Are you using conversation_summarizer? |
Optimizing Search Performance
1. Pre-join Related Data
Instead of searching for client, then holdings, then compliance separately:
## Michael Thompson (c_102)
Tier: HNW | Risk: Moderate
### Portfolio
| Ticker | Weight | Value |
| ------ | ------ | ----- |
| NVDA | 15.2% | $166K |
### Compliance
- TMD: Compliant
- Concentration: Warning on Tech (44.6%)
2. Use Specific Queries
# ❌ Vague
query: "client information"
# ✅ Specific
query: "Michael Thompson c_102 portfolio holdings compliance"
3. Filter by File When Possible
knowledge_search:
file_name_filters: "client-*.md" # Only search client files
4. Increase Context Windows
knowledge_search:
num_previous_segments: 3
num_next_segments: 3
Common RAG Patterns
Pattern 1: Simple RAG
conversation → summarizer → search → agent → response
Best for: Simple Q&A with knowledge base.
Pattern 2: Entity-Aware RAG
conversation → entity_extraction → query_builder → search → agent
Best for: When you need specific entities in the query.
Pattern 3: Multi-Source RAG
conversation → summarizer → [search_1, search_2] → agent (both results)
Best for: Different data types (client data + templates).
Pattern 4: Iterative RAG (Agent with Tools)
agent (has search tool) → searches as needed → synthesizes → response
Best for: Complex queries where initial search might not be enough.
Quick Diagnostic Table
| Problem | Likely Cause | Quick Fix |
|---|---|---|
| No results | Query too specific | Broaden query, check datastore |
| Wrong results | Query too vague | Add entity names, IDs |
| Partial data | Chunk boundaries | Increase segment context |
| Missing files | Wrong datastore | Check datastore_configs |
| Noisy results | Raw conversation as query | Use conversation_summarizer |
Summary
| Question | Answer |
|---|---|
| What does search return? | Relevant chunks, not whole docs |
| How to get good results? | Entity-aware queries with specific terms |
| Multiple data sources? | Use multiple searches or datastores |
| Need document_synthesis? | Often no—agents can synthesize |
| Optimize performance? | Pre-join data, specific queries, filters |
The #1 search quality fix: Include specific entity names and IDs in your query. The difference between "client information" and "Michael Thompson c_102 portfolio" is night and day.
For more on building search queries from conversations, see Ema Workflows: Mastering RAG Flow Control.