02. Agent Runtime Design
An agent runtime has four core components that interact in a defined order. Understanding this topology is essential before writing any code.
Components:
- Orchestrator — Controls flow. Decides what happens next, when to loop, when to stop.
- LLM Interface — Wraps the model API. Handles prompt construction, response parsing, token counting.
- Tool Registry — Maps function names to callable implementations with schema definitions.
- Memory System — Manages state across the agent's lifetime (detailed in Chapters 6-9).
The orchestrator holds references to the others. It queries the LLM, parses tool calls, dispatches through the registry, updates memory, and repeats until completion.
class AgentRuntime:
def __init__(
self,
llm: LLMInterface,
tools: ToolRegistry,
memory: MemorySystem,
max_iterations: int = 20
):
self.llm = llm
self.tools = tools
self.memory = memory
self.max_iterations = max_iterations
async def run(self, prompt: str) -> AgentResponse:
self.memory.add_message(role="user", content=prompt)
for iteration in range(self.max_iterations):
response = await self.llm.chat(
messages=self.memory.get_context(),
tools=self.tools.schemas()
)
if response.finish_reason == "stop":
return AgentResponse(finish_reason="stop", content=response.content)
if response.finish_reason == "tool_use":
results = await self.tools.execute(response.tool_calls)
self.memory.add_tool_results(response.tool_calls, results)
continue
# Handle unexpected finish reasons
return AgentResponse(
finish_reason="max_iterations" if iteration == self.max_iterations - 1
else response.finish_reason,
content=response.content
)
Failure mode: unbounded iteration. Without max_iterations, a buggy tool that always calls itself creates an infinite loop. Production systems must cap iterations and emit alerts when the cap is hit.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Draw a diagram of your target agent's runtime topology. Label which components you need to build versus reuse. Note any integration points that could become bottlenecks.