Chapter 6: Agent Loop¶

In Chapter 5: Agent Configuration, we set up the session and loaded the config — the system prompt, the allowed tools, the MCP servers. Everything is wired and ready. Time to actually talk to the LLM. Welcome to the Agent Loop.

Motivation¶

Why does the agent need a loop at all? Can't we just send a prompt and get an answer?

Sometimes, yes. But most useful interactions look like this:

You ask: "Find all the TODOs in my project"
The LLM decides it needs to run grep -r TODO src/
It sends back a tool call — not a final answer
Something needs to execute that tool and feed the result back
The LLM sees the grep output, writes a summary, and now it's done

That "something" is the Agent Loop. It's the ping-pong referee that keeps the ball moving between you, the model, and the tools until the model says "I'm done."

Think of the Agent Loop like a chess clock. Each side takes a turn, state is passed between them, and the clock keeps ticking until someone declares checkmate. The LLM makes a move (text or tool call), the loop processes it (runs the tool, collects the result), pushes the board back, and the LLM moves again. Without the clock, nobody knows whose turn it is.

Use Case¶

Let's trace a concrete example. You type:

"Find my TODOs and summarize them"

Here's what happens inside the Agent Loop:

You → Loop: Your prompt arrives as a User message
Loop → LLM: The loop packages the conversation history + tool specs and streams a request to the model
LLM → Loop: The model streams back tokens. Partway through, it emits a ContentBlockStart for a tool use — it wants to call grep with {pattern: "TODO", path: "src/"}
Loop detects tool call: The stream ends with StopReason::ToolUse. The loop transitions to PendingToolUseResults
Loop → Tool System: The outer Agent dispatches the tool, waits for the result
Tool System → Loop: The grep output comes back as a ToolResult message
Loop → LLM: The loop sends a new request with the tool result appended to the conversation
LLM → Loop: The model streams a summary: "Found 12 TODOs across 4 files…" — this time with StopReason::EndTurn
Loop stops: No more tool calls. The loop transitions to UserTurnEnded and emits UserTurnEnd metadata

Two round-trips to the model, one tool execution, all orchestrated by the same loop.

Key Concepts¶

1. Turn Structure¶

A user turn is everything that happens between your prompt and the final assistant response. It may involve multiple cycles — each cycle is one request/response pair with the model. A turn with two tool calls has three cycles: prompt → tool call → tool result → tool call → tool result → final answer.

The loop tracks this in UserTurnMetadata:

// crates/agent/src/agent/agent_loop/protocol.rs (simplified)
pub struct UserTurnMetadata {
    pub total_request_count: u32,
    pub number_of_cycles: u32,
    pub input_token_count: u32,
    pub output_token_count: u32,
    pub turn_duration: Option<Duration>,
    pub end_reason: LoopEndReason,
}

2. Streaming Tokens¶

The loop never waits for the full response. It consumes a stream of events — MessageStart, ContentBlockDelta (text chunks), ContentBlockStart (tool use begins), ContentBlockStop, MessageStop, and Metadata. Each event is parsed incrementally by a StreamParseState machine inside the loop.

3. Tool Call Detection¶

When the model wants a tool, it doesn't say so in plain text. It emits structured events: a ContentBlockStart with a ToolUse variant carrying the tool name and ID, followed by ContentBlockDelta events with JSON input fragments, and finally a ContentBlockStop. The loop reassembles the JSON and validates it:

// crates/agent/src/agent/agent_loop/mod.rs (line ~330, simplified)
StreamEvent::ContentBlockStop(_) => {
    if let Some((tool_use_id, name, json_buf)) = self.parsing_tool_use.take() {
        match serde_json::from_str::<Value>(&json_buf) {
            Ok(val) => self.tool_uses.push(ToolUseBlock { tool_use_id, name, input: val }),
            Err(_)  => self.invalid_tool_uses.push(InvalidToolUse { tool_use_id, name, content: json_buf }),
        }
    }
}

If the JSON is malformed, the loop marks the stream as errored and the outer Agent retries with a message asking the model to simplify.

4. Execution States¶

The loop is a state machine with six states:

State	Meaning
`Idle`	Waiting for a prompt
`SendingRequest`	Packaging and sending to the model
`ConsumingResponse`	Streaming tokens from the model
`PendingToolUseResults`	Model asked for tools; waiting for results
`UserTurnEnded`	Model finished; no pending work
`Errored`	Something broke; waiting for retry or close

Every state transition emits a LoopStateChange event so the outer Agent (and ultimately the TUI) can update in real time.

5. Cancellation¶

When you press Ctrl+C, a CancellationToken fires. The loop drains any remaining stream events, finalizes the parse state, and transitions to UserTurnEnded with end_reason: Cancelled. No zombie streams, no leaked futures.

// crates/agent/src/agent/agent_loop/mod.rs (line ~220, simplified)
AgentLoopRequest::Cancel => {
    self.cancel_token.cancel();
    // drain remaining stream events...
    self.set_execution_state(LoopState::UserTurnEnded);
}

6. Error Recovery¶

The loop handles several error classes from StreamErrorKind:

ContextWindowOverflow — the conversation is too long; the outer Agent triggers compaction (summarization) and retries
Throttling — the service is busy; retryable after backoff
StreamTimeout — the model took too long; the Agent injects a "try smaller steps" message and retries
Interrupted — user cancelled; clean shutdown
InvalidJson — the model produced broken tool-call JSON; retry with a "split up the work" hint

How It Fits Together (Sequence Diagram)¶

sequenceDiagram
    participant User
    participant Loop as Agent Loop
    participant LLM as Model (LLM)
    participant Tools as Tool System

    User->>Loop: prompt("find my TODOs")
    Loop->>LLM: stream(messages, tool_specs)
    LLM-->>Loop: ContentBlockDelta (tool_use: grep)
    Loop-->>Loop: detect tool call, state → PendingToolUseResults
    Loop->>Tools: execute grep {pattern: "TODO"}
    Tools-->>Loop: ToolResult (12 matches)
    Loop->>LLM: stream(messages + tool_result)
    LLM-->>Loop: ContentBlockDelta ("Found 12 TODOs…")
    Loop-->>User: UserTurnEnd (summary)

The loop sits at the center — it's the only component that talks to both the model and the tool system. The User and Tools never interact directly.

Internal Implementation¶

The Agent Loop lives in crates/agent/src/agent/agent_loop/ and is split across four files:

File	Purpose
`mod.rs`	The `AgentLoop` struct, `main_loop`, stream parsing, `AgentLoopHandle`
`protocol.rs`	Request/response/event types (`AgentLoopRequest`, `AgentLoopEventKind`, `UserTurnMetadata`)
`types.rs`	Wire types (`Message`, `StreamEvent`, `ContentBlock`, `ToolUseBlock`, `StreamError`)
`model.rs`	The `Model` trait — the abstraction over any LLM backend

The Actor Pattern¶

The AgentLoop is a Tokio actor. It spawns onto its own task and communicates through channels:

// crates/agent/src/agent/agent_loop/mod.rs (line ~155)
pub fn spawn(mut self) -> AgentLoopHandle {
    let handle = tokio::spawn(async move {
        self.main_loop().await;
    });
    AgentLoopHandle::new(id, loop_req_tx, loop_event_rx, handle)
}

The caller gets an AgentLoopHandle with three capabilities: send_request(), recv() (events), and cancel(). The handle's Drop implementation aborts the task — no orphaned loops.

The Main Select Loop¶

Inside main_loop() (line ~170), a tokio::select! multiplexes two branches:

Request branch — receives AgentLoopRequest messages (new prompt, cancel, get state)
Stream branch — pulls the next StreamResult from the current model response stream

// crates/agent/src/agent/agent_loop/mod.rs (line ~172, simplified)
loop {
    tokio::select! {
        req = self.loop_req_rx.recv() => {
            self.handle_agent_loop_request(req.payload).await;
        },
        event = self.curr_stream.next() => {
            self.curr_stream_state.next(event, &mut loop_events);
            // transition state based on what the parser found
        }
    }
}

When the stream branch detects the response has ended, it checks: did the model ask for tools? If yes → PendingToolUseResults. If no → UserTurnEnded. If error → Errored.

The Model Trait¶

The Model trait (in model.rs) is the loop's only dependency on the outside world:

// crates/agent/src/agent/agent_loop/model.rs (line ~27)
pub trait Model: Debug + Send + Sync + 'static {
    fn stream(
        &self, messages: Vec<Message>,
        tool_specs: Option<Vec<ToolSpec>>,
        system_prompt: Option<String>,
        cancel_token: CancellationToken,
    ) -> Pin<Box<dyn Stream<Item = StreamResult> + Send>>;
}

This trait is implemented by the real backend client (which talks to the LLM service) and by MockModel (used in tests). The loop doesn't know or care which model it's talking to — it just consumes the stream.

The Outer Agent Drives the Loop¶

The AgentLoop itself doesn't execute tools or manage conversation history. That's the outer Agent in crates/agent/src/agent/mod.rs. The Agent owns the loop handle and reacts to its events:

// crates/agent/src/agent/mod.rs (line ~835, simplified)
tokio::select! {
    evt = self.agent_loop.recv() => {
        self.handle_agent_loop_event(evt).await;
    },
    // ... other branches for tool results, MCP events, etc.
}

When the Agent receives ResponseStreamEnd with tool uses, it dispatches them to the Tool System. When tool results come back, it packages them into a new SendRequestArgs and calls send_request() on the loop handle — starting the next cycle.

This separation means the loop is pure protocol: stream in, events out. All the messy business logic (permissions, hooks, compaction, tool execution) lives in the Agent.

Key Takeaways¶

The Agent Loop is the innermost execution loop shared by both V1 and V2 of kiro-cli
It's a Tokio actor that multiplexes incoming requests with an outgoing model stream
A single user turn may involve multiple cycles (prompt → tool → result → tool → result → answer)
The loop is a state machine (Idle → SendingRequest → ConsumingResponse → PendingToolUseResults or UserTurnEnded)
It handles cancellation, malformed JSON, and stream errors gracefully
The loop doesn't execute tools itself — it signals PendingToolUseResults and the outer Agent takes over
The Model trait abstracts the LLM backend, making the loop testable with MockModel

What's Next¶

The loop dispatches tools — but what is a tool? How does grep become something the LLM can call? How are permissions checked, results formatted, and timeouts enforced?

Next up: Chapter 7: Tool System — the agent's hands.