Text
cot
step-by-step
explicit reasoning
thinking trace

Chain-of-Thought Reasoning

Explicit step-by-step reasoning with visible intermediate steps. Useful for transparency and debuggability in agentic workflows.

Setup walkthrough

  1. Install Ollamaollama pull deepseek-r1:14b (~9 GB — distilled reasoning model with explicit CoT).
  2. For explicit chain-of-thought: use the model's native thinking mode. In Ollama, run with ollama run deepseek-r1:14b — the model outputs a <think> block with its reasoning trace, then the final answer.
  3. Prompt: "A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. How much does the ball cost?" The model's trace: "Let x = ball cost. Bat = x + 1.00. Total = x + (x + 1.00) = 2x + 1.00 = 1.10 → 2x = 0.10 → x = 0.05. The intuitive answer is 0.10 but that's wrong because bat would be 1.10, total 1.20. The correct answer is $0.05." Output: "The ball costs $0.05."
  4. First CoT response in 5-15 seconds. The thinking trace is visible (proves the model reasoned rather than guessed).
  5. For non-reasoning models: use prompt engineering — "Think step by step." or "Let's work through this problem carefully." Standard chat models then simulate CoT (less reliable than native reasoning models).
  6. For self-consistency: run the same prompt 5 times → majority vote on the answer. Improves accuracy 5-15% on complex problems.

The cheap setup

Used RTX 3060 12 GB (~$200-250, see /hardware/rtx-3060-12gb). Runs DeepSeek R1 Distill Llama 8B at 50-80 tok/s or Qwen 7B at 40-60 tok/s. These 7-8B reasoning models handle the "bat and ball" class of trick problems and multi-step arithmetic reliably. For high-school math (GSM8K): 85-90% accuracy. Pair with Ryzen 5 5600 + 32 GB DDR4 + 512 GB NVMe. Total: ~$400-480. At $400, you get reliable chain-of-thought reasoning for everyday problems. For AIME-level competition math, 32B+ is needed.

The serious setup

Used RTX 3090 24 GB (~$700-900, see /hardware/rtx-3090). Runs DeepSeek R1 Distill Qwen 32B at 15-25 tok/s — AIME 50-70% accuracy with visible reasoning traces. For research-grade CoT: Qwen 3 235B MoE on dual RTX 3090 (48 GB, ~$1,600) at 5-10 tok/s — near-frontier reasoning with full transparency. Total: ~$1,800-2,500. Chain-of-thought at the 32B level is transformative — the model catches its own mistakes, backtracks, and explores alternatives in the thinking trace. The 7B→32B jump is the largest qualitative improvement in reasoning.

Common beginner mistake

The mistake: Hiding the thinking trace from users (or not reading it yourself) because "the answer is what matters." Why it fails: The thinking trace IS the value. A correct answer with garbage reasoning is a hallucination that happened to be right. On the next problem, the same model gives a wrong answer — you have no way to know why. The CoT trace shows you whether the model (a) correctly identified the problem type, (b) applied the right formula, (c) made arithmetic errors, (d) caught and fixed its own mistakes. The fix: Always read CoT traces for important problems. Build applications that display the thinking trace alongside the answer. For automated workflows: log the trace for audit. If the model says the ball costs $0.05 with correct algebra → trust. If it says $0.05 because "I recall this is a trick question" → don't trust (it pattern-matched from training, didn't reason). CoT enables trust calibration — you can assess when to trust the model by reading its reasoning. Without CoT, every answer is a coin flip between "reasoned correctly" and "lucky pattern match."

Recommended setup for chain-of-thought reasoning

Recommended runtimes

Browse all tools for runtimes that fit this workload.

Reality check

Local AI workloads have real hardware constraints that vary by task type. VRAM ceiling decides what model fits; bandwidth decides decode speed; compute decides prefill speed. Pick the GPU tier that fits your actual workload, not the spec sheet.

Common mistakes

  • Buying for spec-sheet VRAM without modeling KV cache + activation overhead
  • Underestimating quantization quality loss below Q4
  • Skipping flash-attention support (real perf gap on long context)
  • Ignoring sustained-load thermals (laptops thermal-throttle within 30 min)

What breaks first

The errors most operators hit when running chain-of-thought reasoning locally. Each links to a diagnose+fix walkthrough.

Before you buy

Verify your specific hardware can handle chain-of-thought reasoning before committing money.

Specialized buyer guides
Updated 2026 roundup