RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to verify chain-of-thought reasoning in R1 models
HOW-TO · INF

How to verify chain-of-thought reasoning in R1 models

intermediate·10 min·By Fredoline Eruo
PREREQUISITES

DeepSeek-R1 model running

What this does

DeepSeek-R1 generates a chain-of-thought (CoT) before producing a final answer. This guide shows how to inspect, validate, and extract reasoning traces for correctness.

Steps

  1. Capture the full response including reasoning tags.

    curl -s http://localhost:11434/api/generate \
      -d '{"model": "deepseek-r1:32b", "prompt": "Solve: 3x + 7 = 22", "stream": false, "raw": true}' \
      | jq -r '.response' > response.txt
    
  2. Extract the reasoning part between tags.

    import re
    with open("response.txt") as f:
        text = f.read()
    reasoning = re.search(r'\[REASONING\](.*?)\[/REASONING\]', text, re.DOTALL)
    answer = re.search(r'\[/REASONING\]\s*(.*)', text, re.DOTALL)
    print("Reasoning steps:\n", reasoning.group(1) if reasoning else "Not found")
    print("Final answer:\n", answer.group(1).strip() if answer else "Not found")
    
  3. Verify logical consistency. Check that each reasoning step follows from the previous one. For "3x + 7 = 22", valid steps are:

    • Subtract 7 from both sides → 3x = 15
    • Divide by 3 → x = 5
  4. Run a counterfactual test. Give an intentionally flawed premise to see if the model catches the issue:

    "All birds can fly. Penguins are birds. Can penguins fly?"
    

    A well-reasoned response should note the contradiction.

Verification

# Expected: reasoning contains numbered steps, final answer matches ground truth
python extract_reasoning.py
# Output: Reasoning steps: 1. Subtract 7... 2. Divide by 3... | Final answer: x = 5

Common failures

  • Missing reasoning tags: Older or distilled R1 variants may not output structured tags. Use raw: true in the API call to see the full output.
  • Reasoning contradicts answer: Indicates model confusion. Re-run with temperature: 0 for deterministic behavior.
  • Truncated reasoning: Increase num_ctx to 16384 to accommodate long chains.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Related guides

  • How to run DeepSeek-R1 reasoning models for complex problem-solving
  • How to tune reasoning depth in R1 models using parameters
RELATED GUIDES
INF
How to run DeepSeek-R1 reasoning models for complex problem-solving
INF
How to tune reasoning depth in R1 models using parameters
← All how-to guidesCourses →