What we learned restructuring an MCP server from 7 "super tools" to 20 focused operations.


The Clever Idea That Wasn't

We built an MCP server for managing AI Employees on the Ema platform. It worked. Users could create personas, manage workflows, sync across environments. But something felt off.

Our server had consolidated ~45 original tools into just 7 "super tools" using a mode parameter:

persona((mode = "list"));
persona((mode = "get"), (id = "abc-123"));
persona(
  (mode = "create"),
  (name = "Sales Bot"),
  (type = "voice"),
  (input = "..."),
);
persona(
  (mode = "analyze"),
  (id = "abc-123"),
  (fix = true),
  (include = ["workflow", "config"]),
);

Fewer tools seemed better. Unix philosophy: do one thing well. Except each tool had 30+ parameters, and the LLM had to figure out which parameters applied to which mode.

Clever engineering. Wrong design.


What Went Wrong

Problem 1: Parameter Sprawl

Every time an LLM called our persona tool, it saw this:

{
  "properties": {
    "mode": { "enum": ["list", "get", "create", "update", "clone", "analyze", ...] },
    "id": { "description": "Persona ID" },
    "name": { "description": "Name for new persona" },
    "type": { "enum": ["voice", "chat", "dashboard"] },
    "from": { "description": "Template or persona to clone from" },
    "input": { "description": "Natural language description" },
    "include_data": { "description": "Include data when cloning" },
    "sanitize": { "description": "Remove PII" },
    "fix": { "description": "Auto-fix issues" },
    "include": { "description": "What to include in analysis" }
  }
}

The LLM saw 30 parameters when it only needed 2 for "get me this persona." Confusion ensued.

Problem 2: Inconsistent Patterns

We had three different patterns across our 7 tools:

Tool Pattern
persona mode parameter required
action Flag-based (all=true, suggest="...")
reference Hybrid (type enum + individual flags)

The LLM had to learn 3 interaction styles. Each tool was a special case.

Problem 3: Wrong User Model

Here's the insight that changed everything:

We designed tools as if humans would read the schemas. But LLMs read the schemas.

We optimized for developer ergonomics—fewer tools, consistent naming, Unix-like composability. But LLMs don't care about those things. They care about:

  1. Does the tool name match the user's intent?
  2. Are the parameters relevant to what I'm trying to do?
  3. Is the description clear about when to use this?

The Key Insight: Users Speak, LLMs Choose

When we observed actual usage:

User: "Clone the Sales SDR with its data and clean it up for demo"
         │
         ▼
    LLM parses intent
         │
         ▼
    LLM searches available tools
         │
         ▼
    LLM selects: persona(mode="clone", from="...", include_data=true, sanitize=true)

The user never typed a tool name. Never selected from a menu. They just described what they wanted.

Tool design is really about one thing: making it easy for LLMs to match natural language intent to the right tool with the right parameters.


Why Not Use Prompts?

MCP has two mechanisms: Tools and Prompts. We initially thought prompts could help—maybe persona.create as a prompt guiding the creation flow?

But the MCP spec is clear:

Mechanism Who Controls How Triggered
Tools Model-controlled LLM invokes automatically
Prompts User-controlled User explicitly selects (slash commands)

Prompts are slash commands (/create_voice_ai). They're for explicit, user-initiated workflows.

In practice, users rarely use slash commands. They just talk. Which means tools are what matter for 95% of interactions.

Design for natural language, not for slash commands. The LLM is the interface. Tools are the API.


The Redesign: 5 Principles

Principle 1: One Operation, One Tool

Instead of modes, each operation gets its own tool:

// Before: mode-based
persona(mode="list")
persona(mode="get", id="abc")
persona(mode="create", name="Bot", ...)

// After: operation-based
persona()           // list (no params = list)
persona(id="abc")   // get (id = get one)
persona_create(name="Bot", ...)
persona_clone(from="abc", ...)

Each tool shows only relevant parameters. persona_create has 5 parameters, not 30.

Principle 2: Names Match Natural Language

Tool names should match how users describe intent:

User Says Tool Name
"show me my personas" persona
"create a voice AI" persona_create
"clone it with data" persona_clone
"what's wrong with it" persona_analyze
"push to staging" sync_run

The LLM matches "clone it with data" to persona_clone(include_data=true) because the names align with natural language.

Principle 3: Flags for Common Combinations

Users often want compound operations: "clone with data AND sanitize." Common combinations become flags:

persona_clone(
  from: "abc",
  name: "Demo Copy",
  include_data: true,   // Clone data files too
  sanitize: true        // Remove PII
)

This handles 80% of cases in one call. For edge cases, the LLM chains tools.

Principle 4: Hierarchy in Names

Data and workflows are persona-scoped. Naming reflects this:

persona                      # list/get personas
persona_create               # create
persona_clone                # clone

persona_data                 # list data for a persona
persona_data_upload          # upload to a persona
persona_data_sanitize        # sanitize data only

persona_workflow             # get workflow
persona_workflow_modify      # modify workflow
persona_workflow_sanitize    # sanitize workflow only

Scope is explicit: persona_sanitize does everything, persona_data_sanitize does just data.

Principle 5: Cross-Platform Compatibility

We almost used dot notation (persona.data.upload). It's cleaner and the MCP spec allows it. Then:

Spec Allows Dots?
MCP Yes
OpenAI API No (^[a-zA-Z0-9_-]+$)

Many MCP clients bridge to OpenAI-compatible APIs. Dot notation breaks those. Snake case (_) is the safe choice.


Before and After

Before: 7 Tools, 30+ Params Each

persona(mode, id, name, type, from, input, include_data, sanitize, fix, include, ...)
data(persona_id, mode, file_id, file, ...)
action(id, all, query, category, suggest, ...)

After: 20 Focused Tools

# Core persona operations
persona                     persona_create              persona_clone
persona_update              persona_analyze             persona_sanitize

# Data (persona-scoped)
persona_data                persona_data_upload         persona_data_generate
persona_data_delete         persona_data_sanitize

# Workflow (persona-scoped)
persona_workflow            persona_workflow_modify     persona_workflow_sanitize

# Versions
persona_version             persona_version_create      persona_version_restore

# Cross-cutting
action                      action_suggest
sync_run                    sync_status
env

Each tool has 3-8 parameters. The LLM sees only what's relevant.


The Mental Model

flowchart TD
    U["USER<br/>'Clone the sales bot with data and sanitize for demo'"]
    L["LLM<br/>Parses intent → Searches tools → Matches parameters"]
    T["TOOL SELECTION<br/>persona_clone(from='sales-bot', name='Demo',<br/>include_data=true, sanitize=true)"]
    M["MCP SERVER<br/>Executes operation → Returns result"]

    U --> L --> T --> M

The user never sees tool names. The LLM is the interface. Design your tools for the LLM, and the LLM will serve the user.


MCP Tool Design Checklist

Use this when designing or reviewing MCP tools:

Check Question
Naming Does the tool name match how users describe the action?
Parameters Does every parameter apply to this specific operation?
Description Would an LLM know when to pick this tool vs. similar ones?
Flags Do flag names sound like natural language ("include_data" vs "incl_d")?
Scope Is it clear what resource this tool operates on?
Compatibility Will the name work in OpenAI/Anthropic APIs (no dots)?
Combinations Are common multi-step workflows capturable in one call?

Red flags:

  • Tool has >10 parameters
  • Parameters only apply to certain modes
  • User needs to know internal naming conventions
  • Description says "Use mode=X for..." instead of separate tools

Key Takeaways

  1. Design for LLM selection, not human browsing. Your tool names and descriptions are an API for LLMs matching natural language to operations.

  2. Fewer parameters beats fewer tools. A tool with 30 parameters is harder for an LLM than 5 tools with 6 parameters each.

  3. Flag names are natural language. When you name a flag include_data, you're making it matchable to "with data" or "include the files."

  4. Prompts are nice-to-have, tools are essential. Users speak natural language, not slash commands. Invest in tools.

  5. Stick to the common denominator. Alphanumeric plus underscore. Your tools might be bridged to APIs that don't allow dots.


References


What patterns have you found for MCP tool design? We'd like to hear what's working.