A practitioner's notebook for building with AI that acts—not AI that answers.

This is not another AI hype blog. There are plenty of those. They're loud, vague, and selling something. This is what happens when you stop reading about AI and start building with it every day—writing the rules, debugging the workflows, deploying the agents, and learning what actually works when the demo ends and production begins.

The site is called Agentic Thinking because the most important shift isn't in the technology. It's in the mental model. And that mental model didn't appear overnight. It evolved through a series of failures, hard-won lessons, and moments where what we thought we knew about software development stopped being true.


How We Got Here

If you're new to agentic AI, Agentic AI: Beyond the Chatbot is the primer—it covers what makes AI "agentic," how it compares to chatbots and copilots, and what it means for enterprise. This section picks up where that primer leaves off.

The AI development story has three chapters, and most people are still reading the second one.

Chapter 1: Chatbots (2022-2023). We got access to powerful language models and immediately did the most obvious thing—bolted them onto chat interfaces. Ask a question, get an answer. Copy, paste, repeat. The models were impressive, but the workflow was medieval: human types prompt, model responds, human evaluates, human acts. Every interaction was isolated. No memory, no context, no continuity.

Chapter 2: Copilots (2023-2024). We embedded AI into development tools. GitHub Copilot, ChatGPT in the sidebar, AI-powered terminals. Better. Now the AI could see your code, suggest completions, answer questions about your codebase. But the mental model was still the same: human drives, AI assists. The copilot was a better autocomplete—valuable, but fundamentally passive. It waited for you to ask, suggested when prompted, and couldn't take initiative or own outcomes.

Chapter 3: Agents (2025-now). The shift that changes everything. AI stopped waiting to be asked and started acting. Not just suggesting code, but writing it, testing it, debugging it, and committing it. Not just answering questions about your codebase, but reading the codebase to decide what to do next. The human went from operator to orchestrator—defining intent, setting constraints, reviewing output.

This third chapter is where "agentic thinking" begins. And it requires a fundamentally different approach to building software.


The Mental Model Shift

For decades, we thought about software tools the same way: you learn the interface, you operate the tool, you get a result. The human drives. The tool executes. Every editor, every terminal, every dashboard reinforced the same assumption—a human is in the loop, making every decision, writing every line.

That assumption broke.

The shift isn't "AI can write code now." That's a capability. The shift is in how you think about the work itself. We stopped treating AI as a tool you wield and started treating it as a participant you orchestrate. That single reframe changes everything: how you write code, how you design systems, how you structure teams, how you think about interfaces.

Old Model (Tool) New Model (Participant)
You write commands You express intent
Tool executes literally Agent interprets contextually
Output is predictable Output requires judgment
Context is your responsibility Context is a shared concern
One tool, one job Multiple agents, orchestrated
Static configuration Dynamic discovery
You manage state State is a system problem
Errors are yours to find Agents review each other

When AI is a tool, you optimize for efficient commands. When AI is a participant, you optimize for clear intent, good context, and effective delegation. You think about what it knows, what it doesn't, where it needs guardrails, and where it can run unsupervised.

Agentic thinking is the discipline of designing systems, workflows, and organizations around AI that acts—not AI that answers.

This isn't a philosophy paper. It's a building discipline. Every post on this site comes from applying this mental model to real systems and seeing what survives contact with production.


Intent Over Implementation

The hardest habit to break is telling the AI how to do something.

We spent careers learning to think in implementation: data structures, algorithms, API calls, error handling. That specificity was the whole job. But when you're orchestrating agents, over-specifying implementation is counterproductive. You end up micromanaging a system that's better at generating code than you are at dictating it.

The shift is to think in intent. Not "create a function that iterates over the array and filters items where the date field is after today," but "hide future-dated posts from the homepage." Not "write a SQL query that joins these three tables," but "find all users who haven't logged in this quarter." The agent has the codebase in context. It knows the data model. It can figure out the implementation—often in ways you wouldn't have written but that are perfectly correct.

Goals Matter More Than Code documents where we learned this the hard way. We built a persona generation system where the compiler, the LLM, the API, and the user all had different definitions of "correct." Fourteen failures later, we realized the problem wasn't implementation—it was that nobody had aligned on intent. Each component was locally correct and globally broken.

Compiler to LLM Workflows shows the practical result of this shift. The old loop was edit → compile → fix errors → compile again. The new loop is intent → generation → validation → refinement. The cycle compresses from hours to minutes, and—this surprised us more than the speed—the quality goes up because the AI generates from the full context of your project rather than the narrow context of the file you're editing.

Intent-first doesn't mean vague. It means precise about what and flexible about how. "Add topic filtering to the homepage grid. Posts have tags in frontmatter. The filter bar needs All, Cursor, Ema, MCP, Series, and Search buttons. Clicking one filters the grid client-side." That's intent-driven. It specifies the outcome, the constraints, and the acceptance criteria—without dictating the implementation.

The agents figure out the how. Your job is to make the what unmistakable.


The Toolkit: Three Layers That Compose

The practical side of agentic thinking centers on three layers. I wrote a full breakdown in The AI Dev Toolkit in 2026, but here's the essential picture.

Layer 1: The AI-Native Development Environment. This is where reasoning happens. Cursor, Claude Code, Warp, Windsurf—the specific tool matters less than what the category provides: an environment where AI agents operate alongside you with access to your codebase, your conventions, and your intent. Some are graphical editors. Some are terminal-first. Claude Code runs entirely in your shell. Warp blends terminal and AI natively. The form factor is converging on the same capability: an AI reasoning engine that understands your project.

What makes them "agentic" rather than "assistive" is the artifact system. Rules encode your conventions. Skills package portable knowledge. Agents bring specialized expertise. Commands trigger repeatable workflows. The specifics vary by tool, but the pattern is universal: you shape the agent's behavior through structured context, not just prompts. The agent is bounded by its context window—powerful within its scope, blind outside it.

Layer 2: The Capability Protocol (MCP). This is how the development environment reaches the outside world. MCP makes system capabilities discoverable at runtime—not through stale documentation, but through typed, self-describing interfaces that an AI can reason about in real time. MCP works the same whether your agent lives in a GUI editor or a terminal session. When designed well, it turns "I don't know what's available" into "I can see exactly what's available, what it expects, and what it returns." The post MCP: From Hardcoded to Live Data covers why this dynamic discovery matters so much.

Layer 3: The Enterprise Execution Platform (AI Employees). This is where agents become workers. Ema handles what the IDE and protocol can't: workflow orchestration across models, persistent state, human-in-the-loop checkpoints, compliance enforcement, and audit trails. AI employees aren't chatbots with job titles. They're governed like the employees they are—with defined roles, escalation paths, and accountability.

None of these layers works well alone. The real leverage is in composition: intent flows from human to development environment, discovery flows through MCP, execution happens on the platform. The AI Dev Toolkit in 2026 walks through a concrete example of all three layers handling a single task.

When to Apply What

One of the first questions practitioners ask: "I have a thing I want the AI to know or do. Where does it go?" The answer isn't obvious, and getting it wrong creates friction that makes the whole system feel broken.

What you're encoding Where it belongs Why
Naming conventions, code style Rule Static policy, rarely changes
"Always do X before Y" Rule Workflow constraint
Reusable analysis or generation Skill Portable knowledge, triggered by intent
Deep specialized analysis Agent Needs isolated expertise and persona
Multi-step repeatable process Command Procedural workflow
Available tools, live system state MCP Changes with the platform
Production workflows at scale Platform Needs state, compliance, governance

The taxonomy varies slightly across tools—Cursor has rules and skills, Claude Code uses CLAUDE.md and project context, Warp has AI workflows—but the categories are universal. The full decision guide is Rules, Agents, Commands, MCP... WTF?. The common mistake is putting everything in one place. You end up with context bloat that burns tokens on every conversation. Separating concerns—policy from capability from data—is the single biggest leverage point in configuring an agentic development environment, regardless of which tool you're using.

How Cursor Finds Skills explains the discovery mechanics behind skill visibility—why your carefully written context sometimes gets ignored. The same principles apply to any tool: if the agent can't find your context, it can't use it.


How I Actually Work: A Day in the Build

Theory is cheap. Here's what agentic development actually looks like in practice.

Setting the Stage

Every project starts with context. I write rules that encode the project's conventions—naming patterns, architecture decisions, what belongs where. Not documentation; operating instructions. The difference matters because these rules don't sit in a wiki nobody reads. They actively shape every AI interaction in the project.

But rules alone aren't enough. I tried that. You end up with rules fighting each other, or with a bloated alwaysApply stack that burns context on every conversation. The solution was separating concerns: rules for policy, skills for capability, MCP for data.

Building with Agents

When I'm implementing a feature, I don't type code and let the AI autocomplete. I delegate.

I describe the intent—the outcome, not the implementation. Then the agent works: reads the existing code, understands the template structure, implements the feature, and presents the result. When it gets stuck or drifts, I route to a specialist. Security concern? Spawn a security auditor. Architecture question? Spawn an architect. The main agent accumulated too much context and is hallucinating patterns from an hour ago? Spawn a subagent with fresh eyes.

This is multi-agent orchestration in practice. Not a swarm of autonomous bots. A human orchestrator delegating to specialized participants, reviewing output, and steering toward outcomes. The persona lens model explains how these agents maintain consistent behavior—they're not different models, they're different contexts applied to the same model.

Coordinating Across Agents

Here's the part nobody warns you about: agents don't talk to each other.

When you're running an Architect agent to design a system, an Implementer to build it, and a Reviewer to check it—they have zero awareness of each other's work. Agent A designs an architecture. Agent B starts implementing without reading Agent A's decisions. Agent C reviews and contradicts both. You become the human switchboard, repeating context endlessly.

The solution turned out to be embarrassingly simple: a shared markdown file that serves as the coordination protocol. Not a database. Not an API. Just a .ctx/tasks/ file in the repo that any agent can read and write. It contains an ownership table (who's doing what), a status dashboard (what's done, what's in progress), decision logs (what was decided and why), and a blocking issues section.

Any agent can claim a phase by updating the table. Other agents see who owns what before starting work. Decisions made by the Architect are visible to the Implementer without you repeating them. The file is persistent (survives session boundaries), transparent (humans can inspect and intervene), and versioned (git tracks every change).

This pattern scales. On complex projects I run three or four agents in parallel, each working on different phases, all coordinating through the same .ctx document. The three-tier context system provides the structure: ephemeral context in .ctx/ for session-specific coordination, internal context in .meta/ for decisions that survive across sessions, and public docs for what the codebase needs to communicate to all contributors.

Managing Context and the Amnesia Problem

Every agent starts every session with total amnesia. The context window is both the superpower and the constraint. Feed it your project's conventions via well-structured rules and it performs. Starve it of context and it guesses—confidently, plausibly, incorrectly.

Context overflow is the silent killer of agentic workflows. You're two hours into a complex implementation, the context window is packed with file reads, failed attempts, and accumulated decisions. The agent starts contradicting its own earlier choices. Quality degrades and you don't notice until the output is wrong.

I manage this with a few strategies:

Checkpoint early, checkpoint often. After every logical unit of work, commit. This creates a save point. If context degrades, you can spawn a fresh subagent that reads the committed code—the clean result—instead of the messy session history.

Write decisions down, not just code. When an agent makes an architecture decision, I have it record the decision and rationale in the .ctx coordination doc. Next session, next agent, the reasoning is there. The code shows what was decided. The .ctx doc shows why.

Split long tasks into agent-sized chunks. If a task requires more than 30-40 minutes of continuous agent work, it's too big for one context window. Decompose it. Have one agent design, commit the design doc, then spawn a fresh agent to implement from the committed spec.

Use rules to pre-load critical context. Instead of re-explaining project conventions every session, encode them in rules that load automatically. The agent arrives knowing your naming patterns, your architecture, and your constraints—without consuming context on explanation.

The Review and Tune Loop

Here's where most people stop. They build, they ship, they move on. The agentic approach adds a loop that changes everything.

After every significant piece of work, I have the AI review its own output. Not a rubber stamp—a structured review. Did the implementation match the intent? Did it follow the project's conventions? Did it introduce patterns that conflict with existing code? I use autonomous workflows that chain build → review → fix → commit without me manually triggering each step.

Then the review gets reviewed. I spawn a code reviewer agent on the diff. I spawn a security auditor. These agents don't have the implementation context—they're seeing the code for the first time, which is exactly the point. Fresh eyes catch what tired eyes miss.

The tune part is critical. When a review surfaces a recurring issue—the agent keeps using the wrong naming pattern, or it ignores a project convention—I don't just fix the instance. I fix the system. That means updating a rule, adding a skill, or adjusting a persona. The next time any agent encounters the same situation, the correction is already in context. This is how the system improves: not through some abstract machine learning, but through deliberate observation and systematic correction.

The Learning Loop

This is the part that compounds. After every session—every debugging marathon, every architecture decision, every production surprise—the system captures what it learned.

The meta-learning system observes patterns: what rules triggered, what commands ran, what manual work kept repeating. It proposes improvements. "You ran lint manually 5 times this session—should this be automatic?" "This rule conflicts with that one—want to resolve it?" "You keep explaining the same context to new agents—should this be a skill?"

The key insight is that files are the protocol. Decisions made in Tuesday's session are available to Wednesday's agents without me repeating context. Patterns observed over a week become rules. Rules that prove their value over a month become skills. Skills that apply across projects become shared libraries. The loop is observe → pattern → propose → test → deploy → observe again.

This is how a development environment goes from "I'm teaching my AI assistant the same things every day" to "my AI assistant knows my project better than I remember." Not magic. Discipline.


Bias, Trust, and Judgment

Here's what nobody in the agentic AI space talks about enough: these systems have biases, and those biases compound when agents act autonomously.

LLMs have training biases—they favor popular patterns over correct ones, they reflect the distribution of their training data, they have opinions about code style that they'll impose unless constrained. When a copilot suggests biased code, a human reviews it before committing. When an autonomous agent chains build → test → commit, that human review checkpoint can get thin.

Three biases show up repeatedly in agentic development:

Recency bias. The agent over-indexes on the most recent context. If you just debugged a caching issue, the agent starts seeing caching problems everywhere. Subagents with fresh context are the direct counter—they don't carry the session's accumulated assumptions.

Popularity bias. The model defaults to the most common pattern from training data, not the best pattern for your project. Your codebase uses a specific state management approach, but the model keeps suggesting the more popular alternative. This is why rules exist—they override training defaults with project-specific constraints.

Confidence bias. LLMs present uncertain information with the same tone as established fact. An agent that generates code "confidently" is not necessarily generating code correctly. The review loop exists specifically for this: the implementing agent is confident, the reviewing agent is skeptical. Structural skepticism catches what individual confidence misses.

The meta-lesson: agentic systems need the same governance structures as human teams. Code review. Separation of concerns. Escalation paths for high-risk changes. The difference is that you can encode these structures into the system itself—the review step isn't optional discipline, it's automated workflow. An AI employee on Ema doesn't skip the compliance check because it's Friday afternoon.

Trust in agentic systems isn't binary. It's graduated. You give agents autonomy proportional to the consequences of failure. Low-risk code formatting? Fully autonomous. Database migration? Human-in-the-loop. Production deployment? Multiple review stages with explicit approval gates.


The Shifts That Matter

Agentic thinking reveals several shifts happening simultaneously. They look like separate trends. They're not—they're the same underlying transition viewed from different angles.

Static → Discoverable. APIs used to be documented in wikis and consumed by developers who read the docs. Now capabilities are self-describing and discoverable at runtime. MCP is the clearest expression of this: instead of reading docs about what a system can do, an agent queries the system directly. The Interface Inflection series explores where this leads—semantic data layers, headless AI, and negotiated integrations.

Assistance → Autonomy. Copilots suggest. Agents act. The difference isn't capability—it's the locus of control. When an AI autonomously chains build → test → commit, the human's job shifts from execution to oversight. Governance becomes existential, not bureaucratic. The New Hire Who Never Sleeps covers why this matters for enterprises.

How → What. The compiler-to-LLM shift is ultimately about moving from implementation-first to intent-first. You stop specifying algorithms and start specifying goals. The agent handles implementation. Your expertise shifts from "knowing how to write it" to "knowing what to ask for and whether the result is right."

Configuration → Learning. Static configs drift. Systems that learn from usage improve. Instead of manually maintaining rules that go stale, the system observes patterns and proposes updates. Instead of hardcoding workflows, the system discovers capabilities and adapts.

Retrieval → Action. Nobody ever wanted search results. They wanted outcomes. The evolution from keyword search to semantic understanding to intent interpretation to autonomous action is the same shift: from "find me information" to "do the thing I actually need done."

Individual → Orchestrated. Single tools used to do single things. Now we compose: MCP bridges systems, agents specialize, platforms execute, and the human orchestrates the ensemble. The democratization of building means more people can participate in that orchestration.


The Vision

Where does this converge? I don't think we end up with AGI that replaces developers. I think we end up with something more interesting and more practical: development environments where the human provides intent, judgment, and accountability—and everything else is handled by a coordinated system of specialized agents operating through discoverable protocols against governed platforms.

The Interface Forecast explores the end state: applications as agent-assembled compositions of capabilities. The organizations that treat their capability layer as a product will have structural advantages. The developers who learn to think agentically—designing for delegation, not just execution—will build faster and better than those still writing every line by hand.

The path from here to there isn't a leap. It's iterative. You start with a rule. Then you extract a skill. Then you connect an MCP server. Then you deploy an AI employee. Each step teaches you something about intent, context, coordination, and trust. Each lesson feeds back into how you design the next step.

This isn't ten years away. It's happening now, one rule at a time, one MCP server at a time, one AI employee at a time. The tools are here. The mental models are catching up.


Who This Is For

You've been burned by hype before. You sat through the blockchain presentations, the metaverse strategy decks, the "AI will replace all developers by 2024" think pieces. You're skeptical, and you should be.

This site is for practitioners who build things. Engineers who want to understand what actually works in AI-assisted development, not what might work in a keynote demo. Leaders who need to deploy AI into real organizations with real compliance requirements. Architects who see the infrastructure shifting and want to understand the pattern before committing to a direction.

If you want breathless optimism, you're in the wrong place. If you want evidence from the build, pull up a chair.

Where to Start

Interest Start Here Why
New to agentic AI Agentic AI: Beyond the Chatbot The primer—what makes AI "agentic"
The toolkit overview The AI Dev Toolkit in 2026 How the three layers compose
Building with Cursor Rules, Agents, Commands, MCP... WTF? The mental model for AI dev tooling
Intent-first development Goals Matter More Than Code Why intent alignment beats implementation
AI employees The New Hire Who Never Sleeps Governance-first thinking
Interfaces and MCP Interfaces Are Changing The paradigm shift
Search and intent Search is Dead. Long Live Action. From retrieval to outcomes
The learning loop AI That Teaches Itself Systems that improve from usage
Agent coordination .ctx: Memory for Amnesiac Agents How agents share context
Technical deep dive The MCP Mental Model Protocol-level understanding

Written from the Build

Every post on this site comes from real work. Real failures, real debugging sessions, real production surprises. I build with Cursor and Claude Code daily. I deploy AI employees on Ema. I design MCP servers and agent architectures and then watch them break in ways the design doc didn't predict.

That's the difference. This isn't analysis from the sidelines. It's notes from inside the machine, shared because the practitioners building this next wave deserve better signal than what the hype cycle provides.

The tools are moving fast. The mental models need to move faster. That's what this site is for.