Inline LLM Calls: Skip the Redeploy

Hardcoded logic is brittle. Every if (type === "report" || type === "document") is a decision that will need updating when business requirements change. This post covers when to replace hardcoded conditionals with inline LLM calls, how to do it efficiently, and when the complexity isn't worth it.

Thesis: Some decisions are better made by models than code. When logic depends on fuzzy matching, domain expertise, or evolving requirements, an inline LLM call lets you upgrade behavior "out of band"—no redeployment needed. But this pattern has costs: latency, token spend, and unpredictability. Use it surgically.

The Hardcoded Logic Trap

Here's code that looks reasonable:

function requiresDocument(itemType: string): boolean {
  const documentTypes = [
    "report",
    "document",
    "pdf",
    "spreadsheet",
    "presentation",
  ];
  return documentTypes.includes(itemType.toLowerCase());
}

Simple. Fast. Testable. And wrong the moment:

Someone creates a "legal_brief" item type
Marketing adds "infographic" to the system
International team uses "rapport" (French) or "dokument" (German)
The product expands to include "audit_record" which definitely needs documents

Each change requires:

Developer time to update the list
Code review
Deployment
Testing in production

For a simple string match, this overhead is fine. But what about:

function determineEscalationPath(ticket: Ticket): EscalationPath {
  if (ticket.priority === "critical" && ticket.customer.tier === "enterprise") {
    return "immediate_engineering";
  }
  if (ticket.category === "billing" && ticket.amount > 10000) {
    return "finance_review";
  }
  if (ticket.sentiment === "angry" && ticket.messageCount > 5) {
    return "manager_intervention";
  }
  // ... 20 more rules that grow every quarter
}

This logic:

Embeds business policy in code
Requires engineering involvement for policy changes
Grows into unmaintainable spaghetti
Can't adapt to novel situations

The Inline LLM Pattern

Instead of hardcoding every decision, delegate fuzzy logic to an LLM:

async function requiresDocument(item: Item): Promise<boolean> {
  const response = await llm.complete({
    prompt: `Determine if the following item type requires document handling.

Item: ${JSON.stringify(item)}

Document handling is required for:
- Reports, documents, PDFs, and similar file-based content
- Spreadsheets and presentations
- Any item that represents exportable/downloadable content
- Legal or compliance artifacts

Respond with only: true or false`,
    maxTokens: 5,
    temperature: 0,
  });

  return response.trim().toLowerCase() === "true";
}

What changed:

Logic is now data (the prompt) rather than code
New item types work without redeployment (the LLM reasons about them)
Domain experts can modify the prompt, not just developers
Edge cases are handled by reasoning, not exhaustive enumeration

The Out-of-Band Upgrade Pattern

The real power comes when prompts are externalized:

// Prompts live in configuration, not code
const DECISION_PROMPTS = await fetchDecisionPrompts();

async function requiresDocument(item: Item): Promise<boolean> {
  return evaluateDecision("requires_document", { item });
}

async function evaluateDecision(
  decisionKey: string,
  context: object,
): Promise<boolean> {
  const promptTemplate = DECISION_PROMPTS[decisionKey];
  if (!promptTemplate) {
    throw new Error(`Unknown decision: ${decisionKey}`);
  }

  const prompt = renderTemplate(promptTemplate, context);
  const response = await llm.complete({
    prompt,
    maxTokens: 10,
    temperature: 0,
  });

  return parseBoolean(response);
}

Now prompt updates happen without code changes:

# decision_prompts.yaml (managed by product/ops team)
requires_document:
  template: |
    Determine if this item requires document handling.

    Item: 

    Requires document handling:
    - Reports, documents, PDFs
    - Spreadsheets, presentations
    - Exportable/downloadable content
    - Legal or compliance artifacts
    - Audit records or certifications  # <-- Added by compliance team

    Respond: true or false

  examples:
    - input: { type: "report" }
      expected: true
    - input: { type: "chat_message" }
      expected: false

Benefits:

Product team can update logic without engineering
Changes deploy instantly (config reload, no code push)
Examples serve as test cases and documentation
Version control on prompts captures policy evolution

When to Use Inline LLM Calls

✅ Good Candidates

Fuzzy Matching:

// Hardcoded: breaks on typos, synonyms, new terms
if (category === "urgent" || category === "critical" || category === "asap")
  // LLM: handles variations naturally
  await isUrgent(category); // "urgente", "time-sensitive", "rush" all work

Classification with Evolving Categories:

// Hardcoded: requires code change for new categories
const SUPPORT_CATEGORIES = ["billing", "technical", "account", "feedback"];

// LLM: new categories work immediately when added to prompt
await classifySupport(message); // prompt lists categories, easily updated

Policy-Based Decisions:

// Hardcoded: business rules buried in code
if (amount > 10000 && !hasApproval && region !== "exempt")
  // LLM: policy is explicit, auditable, modifiable
  await requiresApproval({ amount, approvals, region }); // prompt explains policy

Natural Language Understanding:

// Hardcoded: regex hell
if (message.match(/refund|money back|return|cancel.*order/i))
  // LLM: understands intent, not just keywords
  await detectRefundIntent(message); // handles "I want my $ returned" etc.

❌ Poor Candidates

Deterministic Logic:

// Don't use LLM for math
if (quantity > inventory)
  // Just do the comparison

  // Don't use LLM for lookups
  return userPermissions[action]; // Hash lookup is faster and correct

High-Frequency, Low-Latency Paths:

// Don't add LLM latency to every request
app.use(async (req, res, next) => {
  if (await llmShouldRateLimit(req)) {
    // Bad: adds 100ms+ to every request
    return res.status(429).send();
  }
  next();
});

Security Decisions:

// Don't trust LLM for auth/authz
if (await llmShouldAllowAccess(user, resource)) // NO. Use proper access control.

Mathematically Verifiable Logic:

// Don't use LLM when you can compute
await llmCalculateDiscount(order); // No. Apply the discount formula.

The Cost/Benefit Analysis

Factor	Hardcoded Logic	Inline LLM
Latency	Microseconds	100-1000ms
Cost	Zero marginal	Per-call token cost
Predictability	100% deterministic	Probabilistic
Flexibility	Requires deployment	Config/prompt change
Edge Cases	Explicit enumeration	Reasoned handling
Auditability	Code review	Prompt + response logging
Testability	Unit tests	Eval sets + spot checks

Use inline LLM when:

The flexibility value exceeds the latency/cost
Edge cases are expensive to enumerate
Domain experts need to modify logic
The decision tolerates occasional inconsistency

Keep hardcoded when:

Latency matters (sub-millisecond paths)
Logic is truly deterministic
Security/correctness is non-negotiable
Volume makes per-call cost prohibitive

Efficiency Patterns

Batching Decisions

If you're making many similar decisions, batch them:

// Bad: N LLM calls
for (const item of items) {
  item.requiresDoc = await requiresDocument(item);
}

// Good: 1 LLM call for N items
const decisions = await batchRequiresDocument(items);
items.forEach((item, i) => (item.requiresDoc = decisions[i]));

async function batchRequiresDocument(items: Item[]): Promise<boolean[]> {
  const prompt = `For each item, determine if it requires document handling.

Items:
${items.map((item, i) => `${i + 1}. ${JSON.stringify(item)}`).join("\n")}

Respond with a JSON array of booleans, one per item.
Example: [true, false, true]`;

  const response = await llm.complete({ prompt, maxTokens: items.length * 10 });
  return JSON.parse(response);
}

Caching Decisions

Many decisions are repeatable:

const decisionCache = new Map<string, boolean>();

async function requiresDocument(item: Item): Promise<boolean> {
  const cacheKey = `requires_doc:${item.type}`;

  if (decisionCache.has(cacheKey)) {
    return decisionCache.get(cacheKey)!;
  }

  const decision = await llmRequiresDocument(item);
  decisionCache.set(cacheKey, decision);
  return decision;
}

For richer caching, use semantic similarity:

// If we've seen a similar item, reuse the decision
const similarDecision = await findSimilarDecision(item, threshold: 0.95);
if (similarDecision) {
  return similarDecision.result;
}

Hybrid Approach

Use hardcoded logic for known cases, LLM for unknowns:

async function requiresDocument(item: Item): Promise<boolean> {
  // Fast path: known types
  const knownTypes: Record<string, boolean> = {
    report: true,
    document: true,
    chat_message: false,
    notification: false,
  };

  if (item.type in knownTypes) {
    return knownTypes[item.type];
  }

  // Slow path: ask LLM for unknown types
  const decision = await llmRequiresDocument(item);

  // Optional: log for future hardcoding
  logger.info("LLM decision", { type: item.type, decision });

  return decision;
}

This gives you:

Fast responses for common cases
Graceful handling of new cases
Data to inform future hardcoding

Pre-Computing Decisions

For batch processing, compute decisions ahead of time:

// Nightly job: classify all item types
async function precomputeDecisions() {
  const allTypes = await db.query("SELECT DISTINCT type FROM items");

  for (const type of allTypes) {
    const decision = await llmRequiresDocument({ type });
    await db.upsert("type_decisions", { type, requiresDocument: decision });
  }
}

// Runtime: lookup only
async function requiresDocument(item: Item): Promise<boolean> {
  const row = await db.query(
    "SELECT requiresDocument FROM type_decisions WHERE type = ?",
    [item.type],
  );
  return row?.requiresDocument ?? (await llmRequiresDocument(item)); // Fallback for new types
}

MCP Integration

Inline LLM decisions can be exposed through MCP:

server.tool(
  "evaluate_policy",
  {
    description: "Evaluate a business policy decision using AI reasoning",
    inputSchema: {
      properties: {
        policy: {
          type: "string",
          description: "Policy name (e.g., 'requires_document')",
        },
        context: { type: "object", description: "Context for the decision" },
      },
      required: ["policy", "context"],
    },
  },
  async ({ policy, context }) => {
    const decision = await evaluateDecision(policy, context);
    return {
      content: [
        {
          type: "text",
          text: JSON.stringify({
            policy,
            context,
            decision,
            reasoning: decision.reasoning, // If you capture chain-of-thought
          }),
        },
      ],
    };
  },
);

Now the outer AI assistant can delegate complex decisions to specialized policy evaluation:

User: "Should this ticket go to finance?"
AI: [Calls evaluate_policy({ policy: "escalation_path", context: ticketData })]
    "Based on the escalation policy, this ticket should go to finance_review because
     the amount ($15,000) exceeds the threshold and involves a billing dispute."

Prompt Engineering for Decisions

Structure Matters

// Bad: vague prompt
const prompt = `Is this a document? ${item}`;

// Good: structured prompt with criteria
const prompt = `Determine if this item requires document handling.

ITEM:
${JSON.stringify(item, null, 2)}

CRITERIA FOR DOCUMENT HANDLING:
- File-based content (reports, PDFs, spreadsheets)
- Exportable/downloadable artifacts
- Legal, compliance, or audit materials
- Content that requires versioning or approval workflows

RESPOND WITH ONLY: true or false`;

Include Examples

const prompt = `Classify the urgency of this support ticket.

EXAMPLES:
- "Site is down, can't process orders" → critical
- "Button color looks wrong" → low
- "Can't login to admin panel" → high
- "Feature request for dark mode" → low

TICKET:
${ticket.description}

URGENCY (critical/high/medium/low):`;

Request Structured Output

const prompt = `Analyze this item and determine:
1. Whether it requires document handling
2. The confidence level (high/medium/low)
3. Brief reasoning

ITEM: ${JSON.stringify(item)}

RESPOND IN JSON:
{
  "requiresDocument": boolean,
  "confidence": "high" | "medium" | "low",
  "reasoning": "string"
}`;

Observability and Debugging

When logic lives in prompts, debugging requires logging:

async function evaluateDecision(
  policy: string,
  context: object,
): Promise<Decision> {
  const prompt = renderPrompt(policy, context);
  const startTime = Date.now();

  const response = await llm.complete({ prompt });

  const decision = parseDecision(response);

  // Log everything for debugging
  logger.info("policy_decision", {
    policy,
    context,
    prompt: prompt.substring(0, 500), // Truncate for logging
    response,
    decision,
    latencyMs: Date.now() - startTime,
    model: llm.model,
    tokens: response.usage,
  });

  // Track for evaluation
  metrics.increment("policy_decisions", { policy, decision: decision.result });

  return decision;
}

Build evaluation datasets from production:

// Periodically sample decisions for human review
if (Math.random() < 0.01) {
  // 1% sample
  await saveForReview({
    policy,
    context,
    decision,
    timestamp: new Date(),
  });
}

The Decision Framework

Question	If Yes	If No
Does this need to change without deployment?	Consider LLM	Hardcode is fine
Is the logic fuzzy/natural-language-based?	LLM handles well	Hardcode
Can domain experts articulate the rules?	LLM with their prompt	Engineering owns it
Is sub-100ms latency required?	Hardcode or cache heavily	LLM is fine
Are edge cases expensive to enumerate?	LLM reasons about them	Enumerate in code
Is 99% consistency acceptable?	LLM	Need 100%? Hardcode

Summary

Inline LLM calls trade predictability for flexibility. They let you:

Upgrade logic without redeployment
Handle novel inputs gracefully
Empower domain experts to modify behavior
Reduce code complexity for fuzzy decisions

But they cost:

Latency (100-1000ms per call)
Tokens (real money at scale)
Predictability (probabilistic, not deterministic)
Debugging complexity (prompt engineering isn't code debugging)

The sweet spot: Use inline LLM for decisions that are:

Infrequent enough that latency is acceptable
Fuzzy enough that enumeration is painful
Important enough that flexibility matters
Low-stakes enough that occasional inconsistency is tolerable

For everything else, hardcode it and move on.

This post is part of a series on practical AI architecture patterns. See MCP Dynamic Data Patterns for making MCP data API-driven, and Cursor Rules vs MCP for separating policy from data.