RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Implement Agent Reflection and Self-Correction
HOW-TO · RAG

How to Implement Agent Reflection and Self-Correction

advanced·30 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Agent with multi-turn reasoning, feedback loop, Python 3.10+

What this does

Reflection lets an agent critique its own output, identify errors or gaps, and retry with improved reasoning. Self-correction reduces hallucination and improves answer quality without human intervention.

Steps

  • Add a reflection step after the initial answer. Ask the LLM to critique its own response.
def reflect(answer: str, context: str, llm) -> str:
    prompt = f"""Context: {context}

Initial answer: {answer}

Critique this answer. Identify any:
1. Factual errors or hallucinations
2. Missing information
3. Unclear reasoning
4. Unsupported claims

Provide a revised, improved answer:"""
    return llm.invoke(prompt).content
  • Implement the reflect-retry loop. Keep trying until quality improves or max retries.
def reflect_and_correct(question: str, context: str, llm, max_reflections=3) -> str:
    answer = llm.invoke(f"Context: {context}\nQuestion: {question}\nAnswer:").content

    for i in range(max_reflections):
        critique = llm.invoke(f"""Answer: {answer}

Critique this answer. Rate it 1-10. If < 8, explain what's wrong and provide an improved version.
Start your response with 'SCORE: N'.""").content

        score = extract_score(critique)
        if score and score >= 8:
            return answer

        # Extract improved answer from critique
        improved = extract_improved(critique) or critique
        answer = improved

    return answer
  • Extract scores and improved answers from reflection output.
import re

def extract_score(reflection: str) -> int | None:
    match = re.search(r'SCORE:\s*(\d+)', reflection)
    return int(match.group(1)) if match else None

def extract_improved(reflection: str) -> str | None:
    # Look for content after "Improved answer:" marker
    if "Improved answer:" in reflection:
        return reflection.split("Improved answer:")[1].strip()
    return None
  • Add self-verification of tool results. After calling a tool, verify the result is reasonable.
def verify_tool_result(tool_name: str, args: dict, result: str, llm) -> bool:
    prompt = f"""Tool: {tool_name}
Arguments: {args}
Result: {result[:500]}

Is this result reasonable? Answer only YES or NO."""
    response = llm.invoke(prompt).content.strip()
    return response == "YES"
  • Create a reflection tool the agent can call autonomously.
@tool
def reflect_on_work(work_product: str, criteria: str = "") -> str:
    """Reflect on and critique your own work product."""
    prompt = f"""Work product: {work_product}
Criteria: {criteria}

Identify issues and suggest improvements. Provide a revised version."""
    return llm.invoke(prompt).content

Verification

python -c "
import re
def extract_score(text):
    m = re.search(r'SCORE:\s*(\d+)', text)
    return int(m.group(1)) if m else None
print(extract_score('SCORE: 8'))
# Expected: 8
"

Common failures

  • Reflection confirms incorrect answers. The LLM may agree with its own mistake instead of critiquing it. Use a separate "critic" model or flip the temperature.
  • Infinite correction loop. The agent keeps finding new issues and never finalizes. Cap reflections at a small number (2-3).
  • Score inflation. The LLM consistently rates itself 9/10 regardless of quality. Use absolute criteria (e.g., "Does the answer cite sources?") instead of subjective scoring.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Create Agent Decision-Making Logic
  • How to Build Error Handling in Agent Pipelines
← All how-to guidesCourses →