RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Agents & agentic AI / Planning (in agents)
Agents & agentic AI

Planning (in agents)

Planning in agents refers to the process where an LLM decomposes a complex goal into a sequence of sub-steps or actions before executing them, rather than generating a single response. This enables multi-step reasoning, tool use, and error recovery. In local AI, planning is typically implemented via prompting techniques (e.g., ReAct, Chain-of-Thought) or frameworks like LangChain. The agent generates a plan, executes steps (often calling external tools or APIs), and may revise the plan based on intermediate results. Planning consumes additional context window and inference time, which matters on constrained hardware.

Deeper dive

Planning in agents is a capability that separates simple chatbots from autonomous systems. Instead of a single input-output pass, the agent iterates: (1) analyze the goal, (2) break it into steps, (3) execute each step (possibly using tools like web search, calculators, or code interpreters), (4) observe results, and (5) adjust the plan if needed. Common patterns include ReAct (Reason+Act), where the model interleaves reasoning traces with actions, and Plan-and-Solve, where the model first drafts a full plan then executes. On local hardware, planning is expensive because each step requires a full inference pass, and the plan itself occupies context tokens. Operators often limit plan depth or use smaller models for planning to keep latency acceptable. Frameworks like LangChain and CrewAI provide built-in planning loops, but they can be replicated with careful prompt engineering in llama.cpp or Ollama.

Practical example

An agent tasked with 'Research the latest GPU benchmarks and summarize them' might plan: (1) search web for 'RTX 5090 benchmarks', (2) extract key numbers, (3) search for 'RX 7900 XTX benchmarks', (4) compare, (5) write summary. Each step is a separate LLM call. On a 16 GB VRAM rig running Llama 3.1 8B at Q4, each step takes ~1-2 seconds, so a 5-step plan adds ~5-10 seconds to response time. Longer plans or larger models (e.g., 70B) may cause timeouts.

Workflow example


Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →