Inline LLM Calls: Skip the Redeploy
Hardcoded logic is brittle. Every if (type === "report" || type === "document") is a decision that will need updating when business requirements change. This post covers when to replace hardcoded conditionals with inline LLM calls, how to do it efficiently, and when the complexity isn't worth it.
Thesis: Some decisions are better made by models than code. When logic depends on fuzzy matching, domain expertise, or evolving requirements, an inline LLM call lets you upgrade behavior "out of band"—no redeployment needed. But this pattern has costs: latency, token spend, and unpredictability. Use it surgically.
The Hardcoded Logic Trap
Here's code that looks reasonable:
function requiresDocument(itemType: string): boolean {
const documentTypes = [
"report",
"document",
"pdf",
"spreadsheet",
"presentation",
];
return documentTypes.includes(itemType.toLowerCase());
}
Simple. Fast. Testable. And wrong the moment:
- Someone creates a "legal_brief" item type
- Marketing adds "infographic" to the system
- International team uses "rapport" (French) or "dokument" (German)
- The product expands to include "audit_record" which definitely needs documents
Each change requires:
- Developer time to update the list
- Code review
- Deployment
- Testing in production
For a simple string match, this overhead is fine. But what about:
function determineEscalationPath(ticket: Ticket): EscalationPath {
if (ticket.priority === "critical" && ticket.customer.tier === "enterprise") {
return "immediate_engineering";
}
if (ticket.category === "billing" && ticket.amount > 10000) {
return "finance_review";
}
if (ticket.sentiment === "angry" && ticket.messageCount > 5) {
return "manager_intervention";
}
// ... 20 more rules that grow every quarter
}
This logic:
- Embeds business policy in code
- Requires engineering involvement for policy changes
- Grows into unmaintainable spaghetti
- Can't adapt to novel situations
The Inline LLM Pattern
Instead of hardcoding every decision, delegate fuzzy logic to an LLM:
async function requiresDocument(item: Item): Promise<boolean> {
const response = await llm.complete({
prompt: `Determine if the following item type requires document handling.
Item: ${JSON.stringify(item)}
Document handling is required for:
- Reports, documents, PDFs, and similar file-based content
- Spreadsheets and presentations
- Any item that represents exportable/downloadable content
- Legal or compliance artifacts
Respond with only: true or false`,
maxTokens: 5,
temperature: 0,
});
return response.trim().toLowerCase() === "true";
}
What changed:
- Logic is now data (the prompt) rather than code
- New item types work without redeployment (the LLM reasons about them)
- Domain experts can modify the prompt, not just developers
- Edge cases are handled by reasoning, not exhaustive enumeration
The Out-of-Band Upgrade Pattern
The real power comes when prompts are externalized:
// Prompts live in configuration, not code
const DECISION_PROMPTS = await fetchDecisionPrompts();
async function requiresDocument(item: Item): Promise<boolean> {
return evaluateDecision("requires_document", { item });
}
async function evaluateDecision(
decisionKey: string,
context: object,
): Promise<boolean> {
const promptTemplate = DECISION_PROMPTS[decisionKey];
if (!promptTemplate) {
throw new Error(`Unknown decision: ${decisionKey}`);
}
const prompt = renderTemplate(promptTemplate, context);
const response = await llm.complete({
prompt,
maxTokens: 10,
temperature: 0,
});
return parseBoolean(response);
}
Now prompt updates happen without code changes:
# decision_prompts.yaml (managed by product/ops team)
requires_document:
template: |
Determine if this item requires document handling.
Item:
Requires document handling:
- Reports, documents, PDFs
- Spreadsheets, presentations
- Exportable/downloadable content
- Legal or compliance artifacts
- Audit records or certifications # <-- Added by compliance team
Respond: true or false
examples:
- input: { type: "report" }
expected: true
- input: { type: "chat_message" }
expected: false
Benefits:
- Product team can update logic without engineering
- Changes deploy instantly (config reload, no code push)
- Examples serve as test cases and documentation
- Version control on prompts captures policy evolution
When to Use Inline LLM Calls
✅ Good Candidates
Fuzzy Matching:
// Hardcoded: breaks on typos, synonyms, new terms
if (category === "urgent" || category === "critical" || category === "asap")
// LLM: handles variations naturally
await isUrgent(category); // "urgente", "time-sensitive", "rush" all work
Classification with Evolving Categories:
// Hardcoded: requires code change for new categories
const SUPPORT_CATEGORIES = ["billing", "technical", "account", "feedback"];
// LLM: new categories work immediately when added to prompt
await classifySupport(message); // prompt lists categories, easily updated
Policy-Based Decisions:
// Hardcoded: business rules buried in code
if (amount > 10000 && !hasApproval && region !== "exempt")
// LLM: policy is explicit, auditable, modifiable
await requiresApproval({ amount, approvals, region }); // prompt explains policy
Natural Language Understanding:
// Hardcoded: regex hell
if (message.match(/refund|money back|return|cancel.*order/i))
// LLM: understands intent, not just keywords
await detectRefundIntent(message); // handles "I want my $ returned" etc.
❌ Poor Candidates
Deterministic Logic:
// Don't use LLM for math
if (quantity > inventory)
// Just do the comparison
// Don't use LLM for lookups
return userPermissions[action]; // Hash lookup is faster and correct
High-Frequency, Low-Latency Paths:
// Don't add LLM latency to every request
app.use(async (req, res, next) => {
if (await llmShouldRateLimit(req)) {
// Bad: adds 100ms+ to every request
return res.status(429).send();
}
next();
});
Security Decisions:
// Don't trust LLM for auth/authz
if (await llmShouldAllowAccess(user, resource)) // NO. Use proper access control.
Mathematically Verifiable Logic:
// Don't use LLM when you can compute
await llmCalculateDiscount(order); // No. Apply the discount formula.
The Cost/Benefit Analysis
| Factor | Hardcoded Logic | Inline LLM |
|---|---|---|
| Latency | Microseconds | 100-1000ms |
| Cost | Zero marginal | Per-call token cost |
| Predictability | 100% deterministic | Probabilistic |
| Flexibility | Requires deployment | Config/prompt change |
| Edge Cases | Explicit enumeration | Reasoned handling |
| Auditability | Code review | Prompt + response logging |
| Testability | Unit tests | Eval sets + spot checks |
Use inline LLM when:
- The flexibility value exceeds the latency/cost
- Edge cases are expensive to enumerate
- Domain experts need to modify logic
- The decision tolerates occasional inconsistency
Keep hardcoded when:
- Latency matters (sub-millisecond paths)
- Logic is truly deterministic
- Security/correctness is non-negotiable
- Volume makes per-call cost prohibitive
Efficiency Patterns
Batching Decisions
If you're making many similar decisions, batch them:
// Bad: N LLM calls
for (const item of items) {
item.requiresDoc = await requiresDocument(item);
}
// Good: 1 LLM call for N items
const decisions = await batchRequiresDocument(items);
items.forEach((item, i) => (item.requiresDoc = decisions[i]));
async function batchRequiresDocument(items: Item[]): Promise<boolean[]> {
const prompt = `For each item, determine if it requires document handling.
Items:
${items.map((item, i) => `${i + 1}. ${JSON.stringify(item)}`).join("\n")}
Respond with a JSON array of booleans, one per item.
Example: [true, false, true]`;
const response = await llm.complete({ prompt, maxTokens: items.length * 10 });
return JSON.parse(response);
}
Caching Decisions
Many decisions are repeatable:
const decisionCache = new Map<string, boolean>();
async function requiresDocument(item: Item): Promise<boolean> {
const cacheKey = `requires_doc:${item.type}`;
if (decisionCache.has(cacheKey)) {
return decisionCache.get(cacheKey)!;
}
const decision = await llmRequiresDocument(item);
decisionCache.set(cacheKey, decision);
return decision;
}
For richer caching, use semantic similarity:
// If we've seen a similar item, reuse the decision
const similarDecision = await findSimilarDecision(item, threshold: 0.95);
if (similarDecision) {
return similarDecision.result;
}
Hybrid Approach
Use hardcoded logic for known cases, LLM for unknowns:
async function requiresDocument(item: Item): Promise<boolean> {
// Fast path: known types
const knownTypes: Record<string, boolean> = {
report: true,
document: true,
chat_message: false,
notification: false,
};
if (item.type in knownTypes) {
return knownTypes[item.type];
}
// Slow path: ask LLM for unknown types
const decision = await llmRequiresDocument(item);
// Optional: log for future hardcoding
logger.info("LLM decision", { type: item.type, decision });
return decision;
}
This gives you:
- Fast responses for common cases
- Graceful handling of new cases
- Data to inform future hardcoding
Pre-Computing Decisions
For batch processing, compute decisions ahead of time:
// Nightly job: classify all item types
async function precomputeDecisions() {
const allTypes = await db.query("SELECT DISTINCT type FROM items");
for (const type of allTypes) {
const decision = await llmRequiresDocument({ type });
await db.upsert("type_decisions", { type, requiresDocument: decision });
}
}
// Runtime: lookup only
async function requiresDocument(item: Item): Promise<boolean> {
const row = await db.query(
"SELECT requiresDocument FROM type_decisions WHERE type = ?",
[item.type],
);
return row?.requiresDocument ?? (await llmRequiresDocument(item)); // Fallback for new types
}
MCP Integration
Inline LLM decisions can be exposed through MCP:
server.tool(
"evaluate_policy",
{
description: "Evaluate a business policy decision using AI reasoning",
inputSchema: {
properties: {
policy: {
type: "string",
description: "Policy name (e.g., 'requires_document')",
},
context: { type: "object", description: "Context for the decision" },
},
required: ["policy", "context"],
},
},
async ({ policy, context }) => {
const decision = await evaluateDecision(policy, context);
return {
content: [
{
type: "text",
text: JSON.stringify({
policy,
context,
decision,
reasoning: decision.reasoning, // If you capture chain-of-thought
}),
},
],
};
},
);
Now the outer AI assistant can delegate complex decisions to specialized policy evaluation:
User: "Should this ticket go to finance?"
AI: [Calls evaluate_policy({ policy: "escalation_path", context: ticketData })]
"Based on the escalation policy, this ticket should go to finance_review because
the amount ($15,000) exceeds the threshold and involves a billing dispute."
Prompt Engineering for Decisions
Structure Matters
// Bad: vague prompt
const prompt = `Is this a document? ${item}`;
// Good: structured prompt with criteria
const prompt = `Determine if this item requires document handling.
ITEM:
${JSON.stringify(item, null, 2)}
CRITERIA FOR DOCUMENT HANDLING:
- File-based content (reports, PDFs, spreadsheets)
- Exportable/downloadable artifacts
- Legal, compliance, or audit materials
- Content that requires versioning or approval workflows
RESPOND WITH ONLY: true or false`;
Include Examples
const prompt = `Classify the urgency of this support ticket.
EXAMPLES:
- "Site is down, can't process orders" → critical
- "Button color looks wrong" → low
- "Can't login to admin panel" → high
- "Feature request for dark mode" → low
TICKET:
${ticket.description}
URGENCY (critical/high/medium/low):`;
Request Structured Output
const prompt = `Analyze this item and determine:
1. Whether it requires document handling
2. The confidence level (high/medium/low)
3. Brief reasoning
ITEM: ${JSON.stringify(item)}
RESPOND IN JSON:
{
"requiresDocument": boolean,
"confidence": "high" | "medium" | "low",
"reasoning": "string"
}`;
Observability and Debugging
When logic lives in prompts, debugging requires logging:
async function evaluateDecision(
policy: string,
context: object,
): Promise<Decision> {
const prompt = renderPrompt(policy, context);
const startTime = Date.now();
const response = await llm.complete({ prompt });
const decision = parseDecision(response);
// Log everything for debugging
logger.info("policy_decision", {
policy,
context,
prompt: prompt.substring(0, 500), // Truncate for logging
response,
decision,
latencyMs: Date.now() - startTime,
model: llm.model,
tokens: response.usage,
});
// Track for evaluation
metrics.increment("policy_decisions", { policy, decision: decision.result });
return decision;
}
Build evaluation datasets from production:
// Periodically sample decisions for human review
if (Math.random() < 0.01) {
// 1% sample
await saveForReview({
policy,
context,
decision,
timestamp: new Date(),
});
}
The Decision Framework
| Question | If Yes | If No |
|---|---|---|
| Does this need to change without deployment? | Consider LLM | Hardcode is fine |
| Is the logic fuzzy/natural-language-based? | LLM handles well | Hardcode |
| Can domain experts articulate the rules? | LLM with their prompt | Engineering owns it |
| Is sub-100ms latency required? | Hardcode or cache heavily | LLM is fine |
| Are edge cases expensive to enumerate? | LLM reasons about them | Enumerate in code |
| Is 99% consistency acceptable? | LLM | Need 100%? Hardcode |
Summary
Inline LLM calls trade predictability for flexibility. They let you:
- Upgrade logic without redeployment
- Handle novel inputs gracefully
- Empower domain experts to modify behavior
- Reduce code complexity for fuzzy decisions
But they cost:
- Latency (100-1000ms per call)
- Tokens (real money at scale)
- Predictability (probabilistic, not deterministic)
- Debugging complexity (prompt engineering isn't code debugging)
The sweet spot: Use inline LLM for decisions that are:
- Infrequent enough that latency is acceptable
- Fuzzy enough that enumeration is painful
- Important enough that flexibility matters
- Low-stakes enough that occasional inconsistency is tolerable
For everything else, hardcode it and move on.
This post is part of a series on practical AI architecture patterns. See MCP Dynamic Data Patterns for making MCP data API-driven, and Cursor Rules vs MCP for separating policy from data.