Chapter 9: Code Intelligence¶

In Chapter 8: MCP Manager, we saw how kiro-cli discovers and connects to MCP servers — including one called code-agent-sdk. The MCP Manager treats it as just another server with tools. But what's inside that server? How does it actually understand your code?

That's what this chapter is about. The code-agent-sdk is the brain behind every "find the definition of X" or "rename Y across the whole project" operation. It combines two complementary technologies — Tree-sitter for fast AST parsing and LSP (Language Server Protocol) for deep semantic analysis — into a single, unified toolkit that the LLM calls through MCP.

Motivation: Why Text Search Isn't Enough¶

Imagine you ask the LLM: "Find all callers of the process function."

A naive grep "process" would return: - The function definition itself - Comments mentioning "process" - A variable named process_id - A string literal "process complete" - An entirely different process function in another module

None of that is what you wanted. You wanted semantic results — only the places where that specific process function is actually called.

Analogy: Think of Code Intelligence as a librarian with the Dewey Decimal System versus a pile of books on the floor. Text search is like scanning every page for the word "history" — you'll find cookbooks, novels, and textbooks. Code Intelligence is like looking up "History → American → Civil War" in the catalog and going straight to the right shelf.

Code Intelligence gives the LLM three superpowers that text search can't:

Symbol resolution — "This process on line 42 is the function defined in src/pipeline.rs:15, not the one in src/os.rs:8"
Type-aware navigation — "Jump to where this struct is defined, show me its fields"
Safe refactoring — "Rename getUserName to getUsername in all 47 call sites without touching the string "getUserName" in a log message"

Use Case: "Find All Callers of `foo`"¶

Let's trace a concrete request end-to-end. The LLM asks:

"Find all references to the spawn_session function."

Here's what happens under the hood:

sequenceDiagram
    participant LLM as LLM Agent
    participant MCP as MCP Server
    participant SDK as CodeIntelligence
    participant LSP as Language Server
    participant TS as Tree-sitter

    LLM->>MCP: search_references({symbol_name: "spawn_session"})
    MCP->>SDK: find_references_by_name(request)
    SDK->>TS: find_symbols("spawn_session") → location
    SDK->>LSP: textDocument/references(file, line, col)
    LSP-->>SDK: [{uri, range}, {uri, range}, ...]
    SDK-->>MCP: ReferencesResult
    MCP-->>LLM: JSON with file paths, lines, context

The SDK first uses Tree-sitter to locate the symbol (fast, no server needed), then hands the precise location to the LSP server for a full cross-file reference search. This two-phase strategy is a recurring pattern throughout the codebase.

Key Concepts¶

1. Tree-sitter: Fast, Offline AST Parsing¶

Tree-sitter is an incremental parser that builds a concrete syntax tree (CST) from source code. The code-agent-sdk uses it via the ast-grep library for two things:

Symbol extraction — Walk the AST and pull out function names, class names, struct definitions, etc.
Pattern matching — Search for structural code patterns like $X.unwrap() or async fn $NAME($$$)

Tree-sitter works without a running language server. It parses files directly, making it the fast fallback when LSP isn't available or hasn't initialized yet.

Supported languages are defined in an embedded JSON config:

// crates/code-agent-sdk/src/tree_sitter/config.rs
const LANGUAGES_JSON: &str = include_str!("languages/languages.json");

pub fn get_symbol_def(lang: &str, node_kind: &str) -> Option<&'static SymbolDef> {
    get_config()
        .get(lang)?
        .symbols
        .iter()
        .find(|s| s.node_kind == node_kind)
}

Each language entry maps AST node kinds to symbol types. For example, Rust's function_item node becomes a Function symbol, and its name is found in the identifier child node.

2. LSP: Deep Semantic Analysis¶

The Language Server Protocol provides compiler-grade intelligence — type checking, cross-file references, rename refactoring, and diagnostics. The SDK manages LSP servers through an LspRegistry:

// crates/code-agent-sdk/src/lsp/lsp_registry.rs
pub struct LspRegistry {
    clients: HashMap<String, LspClient>,
    configs: HashMap<String, LanguageServerConfig>,
}

The registry lazily starts language servers on demand. When a tool needs LSP capabilities for a .rs file, the registry looks up which server handles that extension, starts it if needed, and returns the client.

3. Fuzzy Symbol Search¶

When the LLM searches for "calcSum", it should find calculate_sum, calcSumTotal, and calc_sum. The SDK uses a multi-layered scoring algorithm:

// crates/code-agent-sdk/src/utils/scoring.rs
pub fn calculate_fuzzy_score(
    filter_lower: &str,
    label_lower: &str,
    filter_original: &str,
    label_original: &str,
) -> f64 {
    // 1.0  — exact match (case-sensitive)
    // 0.95 — exact match (case-insensitive)
    // 0.9  — prefix match (case-sensitive)
    // 0.85 — prefix match (case-insensitive)
    // 0.8  — contains match
    // <0.8 — Jaro-Winkler fuzzy distance
}

This means the LLM doesn't need to know the exact casing or naming convention — it can describe what it's looking for in natural language, and the fuzzy matcher bridges the gap.

4. AST Pattern Search¶

Beyond symbol names, the SDK supports structural queries using ast-grep pattern syntax. $VAR matches a single AST node; $$$ matches zero or more. For example, $X.unwrap() finds every .unwrap() call, and async fn $NAME($$$) matches any async function definition. This lets the LLM ask "find all places where we call .unwrap() on a Result" — something no text search can do reliably.

5. Language Detection¶

The SDK auto-detects languages from file extensions using the embedded languages.json config. It currently supports: Rust, TypeScript, JavaScript, Python, Java, Go, C, C++, C#, Ruby, Kotlin, Swift, and more.

// crates/code-agent-sdk/src/tree_sitter/config.rs
pub fn lang_from_extension(ext: &str) -> Option<&'static str> { ... }
pub fn get_extensions(lang: &str) -> Option<&'static [&'static str]> { ... }

The MCP Tool Surface¶

The code-agent-sdk exposes 10 tools through its MCP server. Here's the complete set, grouped by capability:

Workspace Management | Tool | Purpose | |------|---------| | workspace_status | Detect languages and available LSP servers | | initialize_workspace | Start language servers (auto-called on first use) |

Symbol Discovery | Tool | Purpose | |------|---------| | search_symbols | Fuzzy search for symbols by name | | lookup_symbols | Batch exact-name lookup | | get_document_symbols | List all symbols in a file |

Navigation & References | Tool | Purpose | |------|---------| | goto_definition | Jump to where a symbol is defined | | find_references | Find all uses of a symbol at a location | | search_references | Find all uses of a symbol by name |

Code Manipulation | Tool | Purpose | |------|---------| | rename_symbol | Rename across the workspace (with dry-run) | | format_code | Format a file using the language server |

Each tool is registered in the MCP server's list_tools handler:

// crates/code-agent-sdk/src/mcp/server.rs
async fn call_tool(&self, request: CallToolRequestParams, ...) -> Result<CallToolResult, ErrorData> {
    match request.name.as_ref() {
        "search_symbols"  => self.find_symbols_tool(request.arguments).await,
        "goto_definition" => self.goto_definition_tool(request.arguments).await,
        "find_references" => self.find_references_by_location_tool(request.arguments).await,
        // ... 7 more tools
    }
}

Internal Implementation¶

The Dual-Strategy Architecture¶

The most important design decision in code-agent-sdk is the dual-strategy pattern. Every symbol operation has two implementations:

Layer	Technology	Strengths	Limitations
`TreeSitterSymbolService`	ast-grep	Fast, no server, works offline, pattern matching	No cross-file type resolution
`LspSymbolService`	Language servers	Precise types, cross-file refs, rename	Requires running server, slower startup

The CodeIntelligence client composes both:

// crates/code-agent-sdk/src/sdk/client.rs
pub struct CodeIntelligence {
    lsp_symbol_service: LspSymbolService,
    tree_sitter_symbol_service: TreeSitterSymbolService,
    lsp_coding_service: LspCodingService,
    tree_sitter_coding_service: TreeSitterCodingService,
    lsp_workspace_service: LspWorkspaceService,
    pub workspace_manager: WorkspaceManager,
}

For operations like find_symbols, Tree-sitter handles the search (it's faster for scanning thousands of files). For operations like goto_definition and find_references, the SDK delegates to LSP because those require type-system knowledge that only a compiler can provide.

For search_references (find by name), the SDK uses the two-phase approach from our sequence diagram: Tree-sitter locates the symbol first, then LSP resolves all references from that precise location.

Symbol Extraction Pipeline¶

When Tree-sitter parses a file, it walks the AST iteratively (not recursively — to avoid stack overflow on deeply nested code):

// crates/code-agent-sdk/src/tree_sitter/symbol_extractor.rs
fn extract_symbols<D: ast_grep_core::Doc>(
    root: &ast_grep_core::Node<'_, D>,
    lang_name: &str,
    relative_path: &str,
    symbols: &mut Vec<SymbolInfo>,
) {
    let mut stack = vec![(root.clone(), None)];
    while let Some((node, container)) = stack.pop() {
        if let Some(def) = get_symbol_def(lang_name, &node.kind())
            && let Some(name) = find_name(&node, &def.name_child)
        {
            symbols.push(SymbolInfo { name, ... });
        }
        for child in node.children() {
            stack.push((child, current_container.clone()));
        }
    }
}

For each AST node, it checks the languages.json config: "Is this node kind a symbol definition in this language?" If yes, it extracts the name from the designated child node and records the location.

LSP Client Lifecycle¶

The LspClient manages a language server as a child process, communicating over stdin/stdout with JSON-RPC:

// crates/code-agent-sdk/src/lsp/client.rs
pub struct LspClient {
    stdin: Arc<Mutex<tokio::process::ChildStdin>>,
    pending_requests: Arc<Mutex<HashMap<String, ResponseCallback>>>,
    config: LanguageServerConfig,
    status: Arc<Mutex<LspStatus>>,
    // ...
}

The lifecycle follows the LSP specification: 1. Spawn the server process (e.g., rust-analyzer, typescript-language-server) 2. Initialize with workspace root, capabilities, and project patterns 3. Open files as the LLM navigates the codebase 4. Request definitions, references, completions, diagnostics 5. Shutdown gracefully when the session ends

Workspace-Wide Search with Timeouts¶

Large codebases can have millions of files. The SDK protects against runaway searches:

// crates/code-agent-sdk/src/tree_sitter/workspace_analyzer.rs
pub async fn find_symbols_with_timeout(
    workspace_manager: &mut WorkspaceManager,
    request: &FindSymbolsRequest,
    timeout_secs: u64,
) -> Result<Vec<SymbolInfo>> {
    tokio::select! {
        result = tokio::task::spawn_blocking(move || {
            find_symbols_sync(&workspace_root, &code_store, &request, timeout_secs)
        }) => result?,
        _ = tokio::time::sleep(Duration::from_secs(timeout_secs)) => {
            Err(anyhow::anyhow!("Symbol search timed out"))
        }
    }
}

The search runs in a blocking thread pool (to avoid starving the async runtime) with a configurable timeout. It also respects .gitignore via the ignore crate, so node_modules and target/ directories are skipped automatically.

Putting It All Together¶

Here's the full picture of how a single tool call flows through the system:

LLM: "Find where spawn_session is defined"
 │
 ▼
MCP Server (server.rs)
 │  Deserializes tool call, routes to handler
 ▼
CodeIntelligence (client.rs)
 │  Chooses strategy: Tree-sitter for search, LSP for definition
 ▼
┌─────────────────────┬──────────────────────────┐
│  Tree-sitter path   │     LSP path             │
│                     │                          │
│  workspace_analyzer │  LspClient               │
│  ↓                  │  ↓                       │
│  symbol_extractor   │  JSON-RPC to server      │
│  ↓                  │  ↓                       │
│  languages.json     │  rust-analyzer / tsserver │
│  (AST node → type)  │  (compiler-grade)        │
└─────────────────────┴──────────────────────────┘
 │
 ▼
Result: SymbolInfo { name, file_path, line, column, type }

The LLM never needs to know which path was taken. It calls search_symbols or goto_definition, and the SDK picks the right strategy internally.

Summary¶

Concept	Role
Tree-sitter	Fast AST parsing — symbol extraction, pattern search, no server needed
LSP	Deep semantic analysis — definitions, references, rename, diagnostics
Dual strategy	Tree-sitter for speed, LSP for precision; SDK picks automatically
Fuzzy scoring	Bridges the gap between natural-language queries and exact symbol names
AST patterns	Structural code search (`$X.unwrap()`) beyond what text search can do
MCP surface	10 tools exposed to the LLM through the standard MCP protocol

Code Intelligence transforms the LLM from a text-pattern matcher into a tool that understands code structure. It knows that process on line 42 is a function call, not a comment — and it can find every other place that function is called, across every file in the project.

What's Next?¶

Code Intelligence is great for structured queries — "find this symbol," "go to that definition," "rename this function." But what if you want meaning-based search? What if you want to ask "find the code that handles authentication" without knowing any function names?

That's where semantic search comes in. In Chapter 10: Semantic Search, we'll see how kiro-cli uses embeddings and vector similarity to search code by meaning, not just by name.