RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Custom Agent Frameworks
  6. /Ch. 3
Custom Agent Frameworks

03. Agent Loop

Chapter 3 of 24 · 15 min
KEY INSIGHT

The loop is a state machine. Each iteration transitions between phases, and your job is to ensure every transition is well-defined—including the error transitions.

The agent loop is the heart of the runtime. It's deceptively simple: generate, execute, evaluate, repeat. But the details—how you handle partial outputs, tool failures, context overflow—determine whether your agent is reliable or brittle.

Loop phases:

async def step(self, iteration: int) -> LoopPhase:
    # Phase 1: Generate
    response = await self.llm.chat(
        messages=self.memory.get_recent(max_tokens=self.llm.context_window - 500),
        tools=self.tools.schemas()
    )
    
    # Phase 2: Execute (if tool calls present)
    if response.tool_calls:
        results = await self.tools.execute_batch(response.tool_calls)
        self.memory.add_tool_results(response.tool_calls, results)
        return LoopPhase.TOOLS_EXECUTED
    
    # Phase 3: Evaluate (no tools = final response or error)
    if response.finish_reason == "stop":
        final = response.content
        self.memory.add_message(role="assistant", content=final)
        return LoopPhase.COMPLETED
    
    return LoopPhase.ERROR

The loop terminates in three ways: the model signals completion (finish_reason="stop"), tools were executed and we loop again, or an error condition (empty response, API failure, context overflow).

Failure mode: context overflow. If memory accumulates too much context, the LLM input exceeds its context window. You must implement truncation strategy—typically keeping the most recent messages plus a system prompt, or implementing semantic compression. The code above uses get_recent with a token budget, but this loses important earlier context. Chapter 9 covers smarter approaches.

Failure mode: tool execution failures. If a tool times out or throws an exception, you need error handling that informs the next generation. Don't silently swallow failures:

async def execute_batch(self, calls: list[ToolCall]) -> list[ToolResult]:
    results = []
    for call in calls:
        try:
            result = await asyncio.wait_for(
                self.tools.get(call.name)(**call.arguments),
                timeout=30.0
            )
            results.append(ToolResult(success=True, output=result))
        except asyncio.TimeoutError:
            results.append(ToolResult(
                success=False,
                error="Tool execution timed out after 30 seconds"
            ))
        except Exception as e:
            results.append(ToolResult(
                success=False,
                error=f"Tool execution failed: {type(e).__name__}: {str(e)}"
            ))
    return results

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement the agent loop for your target agent. Add a phase that handles a specific error condition (network timeout, rate limit, malformed tool response). Write the transition logic explicitly.

← Chapter 2
Agent Runtime Design
Chapter 4 →
Tool Registry