What this does

DeepSeek-R1 generates a chain-of-thought (CoT) before producing a final answer. This guide shows how to inspect, validate, and extract reasoning traces for correctness.

Steps

Capture the full response including reasoning tags.

curl -s http://localhost:11434/api/generate \
  -d '{"model": "deepseek-r1:32b", "prompt": "Solve: 3x + 7 = 22", "stream": false, "raw": true}' \
  | jq -r '.response' > response.txt

Extract the reasoning part between tags.

import re
with open("response.txt") as f:
    text = f.read()
reasoning = re.search(r'\[REASONING\](.*?)\[/REASONING\]', text, re.DOTALL)
answer = re.search(r'\[/REASONING\]\s*(.*)', text, re.DOTALL)
print("Reasoning steps:\n", reasoning.group(1) if reasoning else "Not found")
print("Final answer:\n", answer.group(1).strip() if answer else "Not found")

Verify logical consistency. Check that each reasoning step follows from the previous one. For "3x + 7 = 22", valid steps are:
- Subtract 7 from both sides → 3x = 15
- Divide by 3 → x = 5
Run a counterfactual test. Give an intentionally flawed premise to see if the model catches the issue:
```
"All birds can fly. Penguins are birds. Can penguins fly?"
```
A well-reasoned response should note the contradiction.

Verification

# Expected: reasoning contains numbered steps, final answer matches ground truth
python extract_reasoning.py
# Output: Reasoning steps: 1. Subtract 7... 2. Divide by 3... | Final answer: x = 5

Common failures

Missing reasoning tags: Older or distilled R1 variants may not output structured tags. Use raw: true in the API call to see the full output.
Reasoning contradicts answer: Indicates model confusion. Re-run with temperature: 0 for deterministic behavior.
Truncated reasoning: Increase num_ctx to 16384 to accommodate long chains.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to verify chain-of-thought reasoning in R1 models

What this does

Steps

Verification

Common failures

Operator checkpoint

Related guides