Agent Runtime Design — Custom Agent Frameworks (Chapter 2)

An agent runtime has four core components that interact in a defined order. Understanding this topology is essential before writing any code.

Components:

Orchestrator — Controls flow. Decides what happens next, when to loop, when to stop.
LLM Interface — Wraps the model API. Handles prompt construction, response parsing, token counting.
Tool Registry — Maps function names to callable implementations with schema definitions.
Memory System — Manages state across the agent's lifetime (detailed in Chapters 6-9).

The orchestrator holds references to the others. It queries the LLM, parses tool calls, dispatches through the registry, updates memory, and repeats until completion.

class AgentRuntime:
    def __init__(
        self,
        llm: LLMInterface,
        tools: ToolRegistry,
        memory: MemorySystem,
        max_iterations: int = 20
    ):
        self.llm = llm
        self.tools = tools
        self.memory = memory
        self.max_iterations = max_iterations

    async def run(self, prompt: str) -> AgentResponse:
        self.memory.add_message(role="user", content=prompt)
        
        for iteration in range(self.max_iterations):
            response = await self.llm.chat(
                messages=self.memory.get_context(),
                tools=self.tools.schemas()
            )
            
            if response.finish_reason == "stop":
                return AgentResponse(finish_reason="stop", content=response.content)
            
            if response.finish_reason == "tool_use":
                results = await self.tools.execute(response.tool_calls)
                self.memory.add_tool_results(response.tool_calls, results)
                continue
            
            # Handle unexpected finish reasons
            return AgentResponse(
                finish_reason="max_iterations" if iteration == self.max_iterations - 1 
                else response.finish_reason,
                content=response.content
            )

Failure mode: unbounded iteration. Without max_iterations, a buggy tool that always calls itself creates an infinite loop. Production systems must cap iterations and emit alerts when the cap is hit.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.