Skip to content

Chapter 6: Agent Loop

In Chapter 5: Agent Configuration, we set up the session and loaded the config — the system prompt, the allowed tools, the MCP servers. Everything is wired and ready. Time to actually talk to the LLM. Welcome to the Agent Loop.


Motivation

Why does the agent need a loop at all? Can't we just send a prompt and get an answer?

Sometimes, yes. But most useful interactions look like this:

  1. You ask: "Find all the TODOs in my project"
  2. The LLM decides it needs to run grep -r TODO src/
  3. It sends back a tool call — not a final answer
  4. Something needs to execute that tool and feed the result back
  5. The LLM sees the grep output, writes a summary, and now it's done

That "something" is the Agent Loop. It's the ping-pong referee that keeps the ball moving between you, the model, and the tools until the model says "I'm done."

Think of the Agent Loop like a chess clock. Each side takes a turn, state is passed between them, and the clock keeps ticking until someone declares checkmate. The LLM makes a move (text or tool call), the loop processes it (runs the tool, collects the result), pushes the board back, and the LLM moves again. Without the clock, nobody knows whose turn it is.


Use Case

Let's trace a concrete example. You type:

"Find my TODOs and summarize them"

Here's what happens inside the Agent Loop:

  1. You → Loop: Your prompt arrives as a User message
  2. Loop → LLM: The loop packages the conversation history + tool specs and streams a request to the model
  3. LLM → Loop: The model streams back tokens. Partway through, it emits a ContentBlockStart for a tool use — it wants to call grep with {pattern: "TODO", path: "src/"}
  4. Loop detects tool call: The stream ends with StopReason::ToolUse. The loop transitions to PendingToolUseResults
  5. Loop → Tool System: The outer Agent dispatches the tool, waits for the result
  6. Tool System → Loop: The grep output comes back as a ToolResult message
  7. Loop → LLM: The loop sends a new request with the tool result appended to the conversation
  8. LLM → Loop: The model streams a summary: "Found 12 TODOs across 4 files…" — this time with StopReason::EndTurn
  9. Loop stops: No more tool calls. The loop transitions to UserTurnEnded and emits UserTurnEnd metadata

Two round-trips to the model, one tool execution, all orchestrated by the same loop.


Key Concepts

1. Turn Structure

A user turn is everything that happens between your prompt and the final assistant response. It may involve multiple cycles — each cycle is one request/response pair with the model. A turn with two tool calls has three cycles: prompt → tool call → tool result → tool call → tool result → final answer.

The loop tracks this in UserTurnMetadata:

// crates/agent/src/agent/agent_loop/protocol.rs (simplified)
pub struct UserTurnMetadata {
    pub total_request_count: u32,
    pub number_of_cycles: u32,
    pub input_token_count: u32,
    pub output_token_count: u32,
    pub turn_duration: Option<Duration>,
    pub end_reason: LoopEndReason,
}

2. Streaming Tokens

The loop never waits for the full response. It consumes a stream of events — MessageStart, ContentBlockDelta (text chunks), ContentBlockStart (tool use begins), ContentBlockStop, MessageStop, and Metadata. Each event is parsed incrementally by a StreamParseState machine inside the loop.

3. Tool Call Detection

When the model wants a tool, it doesn't say so in plain text. It emits structured events: a ContentBlockStart with a ToolUse variant carrying the tool name and ID, followed by ContentBlockDelta events with JSON input fragments, and finally a ContentBlockStop. The loop reassembles the JSON and validates it:

// crates/agent/src/agent/agent_loop/mod.rs (line ~330, simplified)
StreamEvent::ContentBlockStop(_) => {
    if let Some((tool_use_id, name, json_buf)) = self.parsing_tool_use.take() {
        match serde_json::from_str::<Value>(&json_buf) {
            Ok(val) => self.tool_uses.push(ToolUseBlock { tool_use_id, name, input: val }),
            Err(_)  => self.invalid_tool_uses.push(InvalidToolUse { tool_use_id, name, content: json_buf }),
        }
    }
}

If the JSON is malformed, the loop marks the stream as errored and the outer Agent retries with a message asking the model to simplify.

4. Execution States

The loop is a state machine with six states:

State Meaning
Idle Waiting for a prompt
SendingRequest Packaging and sending to the model
ConsumingResponse Streaming tokens from the model
PendingToolUseResults Model asked for tools; waiting for results
UserTurnEnded Model finished; no pending work
Errored Something broke; waiting for retry or close

Every state transition emits a LoopStateChange event so the outer Agent (and ultimately the TUI) can update in real time.

5. Cancellation

When you press Ctrl+C, a CancellationToken fires. The loop drains any remaining stream events, finalizes the parse state, and transitions to UserTurnEnded with end_reason: Cancelled. No zombie streams, no leaked futures.

// crates/agent/src/agent/agent_loop/mod.rs (line ~220, simplified)
AgentLoopRequest::Cancel => {
    self.cancel_token.cancel();
    // drain remaining stream events...
    self.set_execution_state(LoopState::UserTurnEnded);
}

6. Error Recovery

The loop handles several error classes from StreamErrorKind:

  • ContextWindowOverflow — the conversation is too long; the outer Agent triggers compaction (summarization) and retries
  • Throttling — the service is busy; retryable after backoff
  • StreamTimeout — the model took too long; the Agent injects a "try smaller steps" message and retries
  • Interrupted — user cancelled; clean shutdown
  • InvalidJson — the model produced broken tool-call JSON; retry with a "split up the work" hint

How It Fits Together (Sequence Diagram)

sequenceDiagram
    participant User
    participant Loop as Agent Loop
    participant LLM as Model (LLM)
    participant Tools as Tool System

    User->>Loop: prompt("find my TODOs")
    Loop->>LLM: stream(messages, tool_specs)
    LLM-->>Loop: ContentBlockDelta (tool_use: grep)
    Loop-->>Loop: detect tool call, state → PendingToolUseResults
    Loop->>Tools: execute grep {pattern: "TODO"}
    Tools-->>Loop: ToolResult (12 matches)
    Loop->>LLM: stream(messages + tool_result)
    LLM-->>Loop: ContentBlockDelta ("Found 12 TODOs…")
    Loop-->>User: UserTurnEnd (summary)

The loop sits at the center — it's the only component that talks to both the model and the tool system. The User and Tools never interact directly.


Internal Implementation

The Agent Loop lives in crates/agent/src/agent/agent_loop/ and is split across four files:

File Purpose
mod.rs The AgentLoop struct, main_loop, stream parsing, AgentLoopHandle
protocol.rs Request/response/event types (AgentLoopRequest, AgentLoopEventKind, UserTurnMetadata)
types.rs Wire types (Message, StreamEvent, ContentBlock, ToolUseBlock, StreamError)
model.rs The Model trait — the abstraction over any LLM backend

The Actor Pattern

The AgentLoop is a Tokio actor. It spawns onto its own task and communicates through channels:

// crates/agent/src/agent/agent_loop/mod.rs (line ~155)
pub fn spawn(mut self) -> AgentLoopHandle {
    let handle = tokio::spawn(async move {
        self.main_loop().await;
    });
    AgentLoopHandle::new(id, loop_req_tx, loop_event_rx, handle)
}

The caller gets an AgentLoopHandle with three capabilities: send_request(), recv() (events), and cancel(). The handle's Drop implementation aborts the task — no orphaned loops.

The Main Select Loop

Inside main_loop() (line ~170), a tokio::select! multiplexes two branches:

  1. Request branch — receives AgentLoopRequest messages (new prompt, cancel, get state)
  2. Stream branch — pulls the next StreamResult from the current model response stream
// crates/agent/src/agent/agent_loop/mod.rs (line ~172, simplified)
loop {
    tokio::select! {
        req = self.loop_req_rx.recv() => {
            self.handle_agent_loop_request(req.payload).await;
        },
        event = self.curr_stream.next() => {
            self.curr_stream_state.next(event, &mut loop_events);
            // transition state based on what the parser found
        }
    }
}

When the stream branch detects the response has ended, it checks: did the model ask for tools? If yes → PendingToolUseResults. If no → UserTurnEnded. If error → Errored.

The Model Trait

The Model trait (in model.rs) is the loop's only dependency on the outside world:

// crates/agent/src/agent/agent_loop/model.rs (line ~27)
pub trait Model: Debug + Send + Sync + 'static {
    fn stream(
        &self, messages: Vec<Message>,
        tool_specs: Option<Vec<ToolSpec>>,
        system_prompt: Option<String>,
        cancel_token: CancellationToken,
    ) -> Pin<Box<dyn Stream<Item = StreamResult> + Send>>;
}

This trait is implemented by the real backend client (which talks to the LLM service) and by MockModel (used in tests). The loop doesn't know or care which model it's talking to — it just consumes the stream.

The Outer Agent Drives the Loop

The AgentLoop itself doesn't execute tools or manage conversation history. That's the outer Agent in crates/agent/src/agent/mod.rs. The Agent owns the loop handle and reacts to its events:

// crates/agent/src/agent/mod.rs (line ~835, simplified)
tokio::select! {
    evt = self.agent_loop.recv() => {
        self.handle_agent_loop_event(evt).await;
    },
    // ... other branches for tool results, MCP events, etc.
}

When the Agent receives ResponseStreamEnd with tool uses, it dispatches them to the Tool System. When tool results come back, it packages them into a new SendRequestArgs and calls send_request() on the loop handle — starting the next cycle.

This separation means the loop is pure protocol: stream in, events out. All the messy business logic (permissions, hooks, compaction, tool execution) lives in the Agent.


Key Takeaways

  • The Agent Loop is the innermost execution loop shared by both V1 and V2 of kiro-cli
  • It's a Tokio actor that multiplexes incoming requests with an outgoing model stream
  • A single user turn may involve multiple cycles (prompt → tool → result → tool → result → answer)
  • The loop is a state machine (IdleSendingRequestConsumingResponsePendingToolUseResults or UserTurnEnded)
  • It handles cancellation, malformed JSON, and stream errors gracefully
  • The loop doesn't execute tools itself — it signals PendingToolUseResults and the outer Agent takes over
  • The Model trait abstracts the LLM backend, making the loop testable with MockModel

What's Next

The loop dispatches tools — but what is a tool? How does grep become something the LLM can call? How are permissions checked, results formatted, and timeouts enforced?

Next up: Chapter 7: Tool System — the agent's hands.