← Back to Videos
AICloud Architecture

Claude Code Decoded #3 — The Agentic Message Loop: How Claude Thinks

Episode 3: The full agentic message loop inside Claude Code — query() generator, system prompt assembly, streaming parse, parallel tool execution.

📅 26 June 20269:04✍️ Rahul Kumar

The Agentic Message Loop: The Architectural Heart of Claude Code

This is episode three in the Claude Code Decoded series. In the previous episodes we covered how Claude Code initialises and how the tool registry is constructed. In this episode we go deeper into the mechanism that actually makes an agent an agent: the agentic message loop. Understanding this loop is the single most important architectural insight for anyone building on top of agentic AI systems — whether you are using Claude Code, building your own agent framework, or evaluating enterprise agent platforms.

What Happens Every Time You Press Enter

When you submit a prompt in Claude Code, a deterministic sequence begins. At its core is an async generator function — query() — that drives the entire interaction from prompt submission to final response. The loop does not terminate until the model stops emitting tool calls and returns a final text response. The sequence on every iteration is:

  • Build the prompt: Assemble the full context — CLAUDE.md contents, project settings, tool schemas, MCP server declarations, conversation history, and the current user message
  • Call the model with streaming: Send the assembled prompt to the Claude API and open a streaming response
  • Parse the stream: Decompose the streaming response into text blocks and tool call blocks as they arrive
  • Execute tool calls: Run tool calls in parallel where dependencies allow, subject to permission checks
  • Feed results back: Append tool results to the conversation and recurse
  • Recurse until done: Repeat until the model emits a response with no tool calls

How the System Prompt Gets Assembled

The system prompt is not static. It is constructed fresh on each outer invocation from four major components. First, CLAUDE.md content — Claude Code reads CLAUDE.md files from the project root and subdirectories, concatenating their contents as persistent context. Second, settings-derived instructions — behaviours configured in settings.json contribute structured text that constrains and shapes model behaviour. Third, tool schemas — every registered tool's full JSON schema is included so the model knows exactly what tools are available. Fourth, MCP server declarations — connected MCP servers and their available resources are declared, giving the model awareness of external capabilities.

Streaming Response Parsing

The model streams its response as a sequence of typed events. Claude Code's parser maintains state across the stream, accumulating text into text blocks and assembling tool call invocations as structured JSON arrives incrementally. A tool call's parameters may arrive across many streaming chunks and must be buffered and validated before execution — this is non-trivial parsing work that happens on every model turn.

Parallel Tool Execution and Permission Checks

When the model emits multiple tool calls in a single response turn, Claude Code analyses the dependency graph and runs independent tool calls in parallel. A file read and a shell command with no data dependency between them execute concurrently, reducing wall-clock latency significantly for complex multi-step tasks. Before any tool executes, the permission system checks the call against the allowlist in settings.json — tools not in the allowlist trigger an interactive permission prompt or are blocked entirely in non-interactive sessions.

Why the Loop Is the Key Insight

Every agentic AI system — regardless of framework, model, or platform — implements a version of this loop. The model cannot execute code or read files. It can only emit text. The loop is the mechanism that translates model outputs into real-world effects and feeds real-world results back into model context. Understanding the loop explains why latency accumulates across multi-step tasks, why parallel tool execution matters, and why system prompt construction is a performance and cost lever.

Key Takeaways

  • The query() async generator is the agentic loop — it recurses until the model stops calling tools
  • System prompt assembly is dynamic and draws from CLAUDE.md, settings, tool schemas, and MCP declarations
  • Parallel tool execution reduces latency for independent calls — a meaningful optimisation for complex agent tasks
  • Permission checks gate every tool call — understanding the allowlist model is essential for production deployments
  • The loop architecture is universal — every agent framework implements the same fundamental pattern

Watch on YouTube

▶ Watch Now

Opens in YouTube

Share on LinkedIn

One click — copies a ready-to-post update about this video

About the Author

Rahul Kumar is a Senior Cloud and AI Architect at Microsoft with 13+ years of enterprise experience across Azure, AWS, and GCP.

Book a Discussion