RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Custom Agent Frameworks
  6. /Ch. 13
Custom Agent Frameworks

13. Multi-Agent Protocols

Chapter 13 of 24 · 15 min
KEY INSIGHT

Multi-agent protocols fail silently until they don't. Explicit message contracts and state machines catch 90% of integration bugs before they hit production.

Multi-agent systems fail in predictable ways when protocols aren't explicitly defined. Most developers assume agents will "figure it out," then spend weeks debugging mysterious deadlocks or infinite loops.

The Core Problem

When two or more agents interact, you need a contract that specifies message ordering, termination conditions, and error propagation. Without this, you're building on quicksand.

Protocol Design Patterns

The two most reliable patterns are request-response and publish-subscribe. Request-response works for synchronous operations where the caller needs an immediate result:

class RequestResponseProtocol:
    def __init__(self, timeout_seconds: float = 30.0):
        self.timeout = timeout_seconds
        self.pending: dict[str, asyncio.Future] = {}
    
    async def send_request(self, agent_id: str, payload: dict) -> dict:
        correlation_id = generate_id()
        future = asyncio.Future()
        self.pending[correlation_id] = future
        
        try:
            await self._deliver(agent_id, {
                "type": "request",
                "correlation_id": correlation_id,
                "payload": payload
            })
            return await asyncio.wait_for(future, timeout=self.timeout)
        finally:
            self.pending.pop(correlation_id, None)
    
    async def handle_response(self, correlation_id: str, result: dict):
        if correlation_id in self.pending:
            self.pending[correlation_id].set_result(result)

Failure Mode: Race Conditions

The most common failure happens when responses arrive out of order or get duplicated. Always use correlation IDs, never assume message arrival order matches dispatch order.

State Machine Approach

For complex multi-agent workflows, model each agent as a state machine with explicit transitions:

class AgentState:
    IDLE = "idle"
    WAITING = "waiting"
    PROCESSING = "processing"
    ERROR = "error"
    TERMINATED = "terminated"

Define allowed transitions explicitly. This makes the protocol auditable and testable.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Design a protocol for three agents where Agent A must collect results from B and C before producing output. Write the state transitions in code and identify failure points where messages could be lost.

← Chapter 12
Re-planning
Chapter 14 →
Agent Communication