How misaligned objectives between compilers, LLMs, and humans created 14 failures—and what we learned about designing for intent.

The Question Nobody Asked

When we started building the persona generation system, we jumped straight to implementation. Parse the input. Build the spec. Compile to JSON. Deploy.

We never stopped to ask: what is each component actually trying to accomplish?

  • What does the compiler want?
  • What does the LLM want?
  • What does the API want?
  • What does the user want?

These goals aren't the same. And when they conflict, systems break in ways that are hard to debug — because the code is "correct" by each component's definition of correct.


Four Actors, Four Goals

The Compiler's Goal: Structural Validity

The compiler has one job: ensure the output conforms to its type definitions.

interface Widget {
  name: string;
  type: number;
  [key: string]: unknown;
}

The compiler asks: "Does this object have a name that's a string and a type that's a number?"

If yes, the compiler is satisfied. It doesn't care if:

  • The name is empty ("")
  • The type is a valid widget type ID
  • The nested config is what the API expects
  • The output will actually work at runtime

The compiler's goal is syntactic correctness, not semantic correctness.

The API's Goal: Contract Enforcement

The API has a different goal: reject anything that doesn't match its internal expectations.

The API's contract (partially documented, mostly implicit):

  • Widget name must be non-empty and match a known widget type
  • Widget config must be nested under a key matching the name
  • Actions must have version, displaySettings, typeArguments, tools
  • Workflow name must match the persona's existing workflow (if updating)

The API doesn't care about TypeScript types. It cares about runtime invariants.

The API's goal is runtime safety, enforced through validation.

The LLM's Goal: Pattern Completion

When an LLM generates code or JSON, its goal is: produce output that looks like what it's seen before.

Given a prompt like "generate a voice AI persona config," the LLM:

  1. Recalls patterns from training data
  2. Completes the most likely structure
  3. Fills in plausible values

The LLM doesn't validate against a spec. It doesn't know the API contract. It produces what seems right based on statistical likelihood.

The LLM's goal is plausibility, not correctness.

The User's Goal: Outcome Achievement

The user doesn't care about any of this. Their goal: "I want a working Voice AI that handles sales calls."

They don't care if:

  • The widget format is widget_config vs [widgetName]
  • The action namespace is ["search", ...] vs ["actions", ...]
  • The workflow name matches an internal identifier

They care: does it work when I call it?

The user's goal is functional outcome, not structural correctness.


How Goal Misalignment Caused Failures

Failure 1: Empty Widget Configs

What happened:

// Compiler produced this
const widget = {
  name: "conversationSettings",
  type: 39,
  conversationSettings: {}, // Empty
};

Goal analysis:

Actor Was their goal met? Why?
Compiler Yes Valid TypeScript object
API Yes Non-empty name, valid type
LLM Yes Plausible structure
User No Voice AI has no greeting, no identity

The user's persona worked technically but said nothing useful. The welcome message was undefined. The identity was blank.

Root cause: The compiler's goal (valid structure) was met. The user's goal (functional outcome) was not. Nobody's goal included "populate sensible defaults."

Failure 2: Wrong Namespace

What happened:

// Compiler's namespace mapping
const ACTION_NAMESPACES = {
  search: ["search", "emainternal"],  // Wrong
};

// Produced
{ "action": { "name": { "namespaces": ["search", "emainternal"], "name": "search" } } }

Goal analysis:

Actor Was their goal met? Why?
Compiler Yes String array is valid
API No Namespace doesn't exist in action registry
LLM N/A Didn't generate this
User No Workflow fails to deploy

Root cause: The compiler's type system can't express "this string array must be one of these specific allowed values." The API enforces this at runtime. The goals diverged at the boundary.

Failure 3: Workflow Name Mismatch

What happened:

// Compiler generated
const workflowDef = {
  workflowName: {
    name: { namespaces: ["ema", "workflows"], name: "sales_assistant" },
  },
};

// API expected
// "workflowName must match the persona's existing workflow name"

Goal analysis:

Actor Was their goal met? Why?
Compiler Yes Generated valid workflow structure
API No Name doesn't match existing workflow
LLM N/A Didn't generate this
User No Deploy rejected with cryptic error

Root cause: The compiler's goal was "generate a workflow." The API's goal was "update an existing workflow." These are different operations with different requirements. The compiler had no concept of "existing state."

Failure 4: Confident Wrong (LLM Hallucination)

What happened:

When we asked the LLM to generate widget config:

{
  "widget_name": "voiceSettings",
  "widget_type_id": 38,
  "widget_config": {
    "voice_model": "eleven_labs_v2",
    "language": "en-US"
  }
}

Goal analysis:

Actor Was their goal met? Why?
Compiler Yes Valid object shape
API No Wrong field names, hallucinated voice model
LLM Yes Output looks plausible
User No Config rejected

Root cause: The LLM's goal is pattern completion. It completed a pattern that looked like widget config. The field names were wrong. The voice model doesn't exist. But statistically, the output was plausible.


Intent vs. Implementation: The Core Problem

Every failure shared a common pattern: the implementation was correct by its own definition, but incorrect by the user's definition.

flowchart TB
    A["User Intent:<br/>'Create a Voice AI for sales'"]:::primary
    A --> B["Implementation:<br/>parse → build → compile → deploy"]:::secondary
    B --> C["Actual Outcome:<br/>'Empty persona, API rejects deploy'"]:::warning

The pipeline never asked: "What does the user actually need to happen?"

It asked: "What's the next valid transformation?"


Designing for Intent: The Fix

Step 1: Define Success at the User Level

Before writing any code, we defined what success looks like from the user's perspective:

Success criteria for "create Voice AI for sales":
1. Persona exists in the platform
2. Persona has type=voice
3. Welcome message greets the caller appropriately
4. Identity describes the sales purpose
5. Workflow can receive calls (has voice trigger)
6. Workflow can respond (has response generation)

Notice: none of these mention widget_config format or namespace arrays. These are outcome requirements, not structural requirements.

Step 2: Work Backward from Outcomes

For each outcome, we identified what must be true:

Outcome Structural Requirement Who Enforces
Persona exists createAiEmployee() succeeds API
Has welcome message conversationSettings.welcomeMessage populated Our code
Valid workflow All required action fields present API
Callable voice_trigger node exists Our code

Then we asked: who currently ensures each requirement?

For "valid workflow," the answer was: nobody. The compiler didn't know the required fields. The LLM didn't know the exact format. The API only rejected after the fact.

Step 3: Assign Goals to Components

We explicitly assigned goals to each component:

flowchart TB
    subgraph IP["INTENT PARSER"]
        IP1["Goal: Extract requirements"]:::primary
        IP2["Input: 'Voice AI for sales'"]:::secondary
        IP3["Output: type, purpose"]:::secondary
    end

    subgraph TS["TEMPLATE SELECTOR"]
        TS1["Goal: Find valid starting point"]:::secondary
        TS2["Input: type"]:::secondary
        TS3["Output: template_id"]:::secondary
    end

    subgraph CG["CONFIG GENERATOR"]
        CG1["Goal: Produce settings"]:::primary
        CG2["Input: purpose"]:::secondary
        CG3["Output: welcomeMessage"]:::secondary
    end

    subgraph MG["MERGER"]
        MG1["Goal: Combine configs"]:::secondary
        MG2["Input: template + generated"]:::secondary
        MG3["Output: merged config"]:::secondary
    end

    subgraph VL["VALIDATOR"]
        VL1["Goal: Catch errors"]:::warning
        VL2["Input: final config"]:::secondary
        VL3["Output: valid or errors"]:::secondary
    end

    subgraph DP["DEPLOYER"]
        DP1["Goal: Persist to platform"]:::accent
        DP2["Input: validated config"]:::secondary
        DP3["Output: persona_id"]:::secondary
    end

    IP --> TS --> CG --> MG --> VL --> DP

Each component has one goal. No component tries to do everything.

Step 4: Define Contracts Between Components

Instead of implicit assumptions, we made contracts explicit:

// Contract: Intent Parser → Template Selector
interface ParsedIntent {
  personaType: "voice" | "chat" | "dashboard";
  purpose: string;
  requirements: string[];
}

// Contract: Template Selector → Config Generator
interface TemplateSelection {
  templateId: string;
  templateConfig: Record<string, unknown>; // Known-valid structure
}

// Contract: Config Generator → Merger
interface GeneratedConfig {
  conversationSettings?: {
    welcomeMessage: string;
    identityAndPurpose: string;
  };
  // Only fields we explicitly generate
}

// Contract: Merger → Validator
interface MergedConfig {
  widgets: Array<{
    name: string;
    type: number;
    [key: string]: unknown;
  }>;
  // Must have all required fields
}

Now each component knows exactly what it receives and what it must produce.


The LLM's Role: Intent Interpretation, Not Structure Generation

The key insight: LLMs are good at understanding intent, not at producing exact structures.

Before: LLM Generates Structure

Prompt: "Generate a widget config for voice settings"

LLM Output: {
  "widget_name": "voiceSettings",
  "widget_type_id": 38,
  "widget_config": { ... }  // Wrong format, hallucinated values
}

Problem: The LLM is guessing at structure. It doesn't know the API contract.

After: LLM Interprets Intent, Template Provides Structure

Prompt: "What voice settings does a sales AI need?"

LLM Output: {
  "welcomeMessage": "Hello, thank you for calling. I'm here to help with your sales inquiry.",
  "identityAndPurpose": "You are a professional sales assistant focused on understanding customer needs.",
  "speechCharacteristics": "Friendly, professional, patient"
}

Then: Merge these values into template's valid structure.

The LLM produces semantic content. The template provides structural correctness. Each component does what it's good at.


Brownfield: Intent-Driven Transformation

For existing personas, the LLM's role is different: understand what change the user wants, then apply it.

Before: Keyword Matching

// Old approach: parse keywords
if (input.includes("add") && input.includes("search")) {
  workflow.nodes.push(createSearchNode());
}

Problem: Brittle. "Include a lookup step" wouldn't match. Neither would "add knowledge base retrieval."

After: Intent Interpretation + Schema-Guided Transformation

// New approach: LLM understands intent, works with typed schema
const prompt = `
Current workflow has these nodes: ${JSON.stringify(currentSpec.nodes)}

User request: "${userInput}"

What changes are needed? Output a WorkflowSpec with the modifications.
The schema is:
${WORKFLOW_SCHEMA_FOR_LLM}
`;

const transformedSpec = await llm.generate(prompt);
const workflowDef = compileWorkflow(transformedSpec);

The LLM understands:

  • "add search" = add a search node
  • "include a lookup step" = add a search node
  • "add knowledge base retrieval" = add a search node

All three map to the same outcome because the LLM interprets intent, not keywords.


Goal Alignment Checklist

When designing a pipeline with multiple actors (compiler, LLM, API, user), ask:

1. What is each actor's goal?

Actor Goal How It's Achieved
User Working outcome Entire pipeline succeeds
LLM Plausible output Pattern completion
Compiler Valid types Type checking
API Safe operations Runtime validation

2. Where do goals conflict?

  • Compiler says "valid" but API says "invalid" → structural mismatch
  • LLM says "plausible" but API says "doesn't exist" → hallucination
  • User says "working" but system says "deployed" → outcome vs. status

3. Who ensures each requirement?

For every success criterion:

  • Who checks this?
  • When is it checked?
  • What happens on failure?

If the answer is "nobody" or "too late," add a component.

4. What does each component NOT know?

  • Compiler doesn't know: runtime contract, existing state, semantic meaning
  • LLM doesn't know: exact API format, valid enum values, version requirements
  • API doesn't know: user intent, what would fix the error, why you sent this

Design components to fill each other's gaps.


The Resulting Architecture

flowchart TB
    UI["USER INTENT<br/>'Create a Voice AI for sales'"]:::primary

    IP["INTENT PARSER (LLM)<br/>Understand what the user wants"]:::agent

    TS["TEMPLATE SELECTION<br/>Get valid starting structure"]:::secondary

    VG["VALUE GENERATION (LLM)<br/>Create persona-specific content"]:::agent

    SM["STRUCTURE MERGE<br/>Combine template + values"]:::secondary

    VL["VALIDATION<br/>Catch errors before API"]:::warning

    DP["DEPLOYMENT<br/>Persist to platform"]:::secondary

    UO["USER OUTCOME<br/>Working Voice AI"]:::accent

    UI --> IP --> TS --> VG --> SM --> VL --> DP --> UO

LLM handles: intent interpretation, content generation (semantic tasks) Templates handle: structural correctness (known-valid starting point) Deterministic code handles: merging, validation, deployment (mechanical tasks) User gets: working outcome, not just "deployed successfully"


Key Takeaways

1. Define success at the user level first

Don't start with "what does the compiler need?" Start with "what does the user need to see happen?"

2. Each component should have one goal

If a component is trying to "understand intent AND generate valid structure AND validate format," it will fail at something.

3. LLMs interpret intent; templates provide structure

Let each do what it's good at. LLMs understand "sales AI" → friendly greeting. Templates provide { name: "conversationSettings", type: 39 }.

4. Validate at boundaries, not just at the end

The API is the final validator. But if you only find errors there, you've lost all context about what went wrong and why.

5. Goals conflict — design for the conflicts

The compiler's goal (valid types) and the API's goal (valid runtime structure) are different. Acknowledge this. Build bridges between them.


Try This

Next time you're building a pipeline with multiple components:

  1. Write down each component's goal in one sentence
  2. Identify where goals conflict — these are your failure points
  3. Assign explicit contracts between components
  4. Ask "who ensures this?" for every success criterion
  5. Let each component do one thing well

The 14 failures we experienced weren't bugs in any single component. They were goal misalignments that nobody designed for.

Once we designed for intent — user intent as the top goal, component goals as subgoals that must align — everything worked.


This post is part of a series on AI-assisted development workflows. See Compiler-Generated vs. LLM-Orchestrated Code for the technical deep dive, and From Compiler Workflows to LLM Workflows for the conceptual foundation.