RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /DeepSeek R1 and Reasoning Models
  6. /Ch. 4
DeepSeek R1 and Reasoning Models

04. Chain-of-Thought in Reasoning

Chapter 4 of 18 · 15 min
KEY INSIGHT

CoT is no longer a prompting technique—it's R1's native mode. Your job shifts from triggering reasoning to verifying it. Build verification into your pipeline for high-stakes use cases.

Chain-of-thought prompting was the first technique to show that language models could reason step-by-step when asked. Reasoning models like R1 have internalized this capability, but operators still influence how well it manifests. This chapter covers the mechanics of CoT and how to work with R1's native reasoning.

The Discovery of CoT

Standard prompting asks models to produce answers directly. Chain-of-thought prompting asks models to "think step by step" first. The technique emerged empirically—researchers noticed that forcing intermediate steps improved accuracy on math and logic problems. The hypothesis: explicit steps reduce the burden on working memory by externalizing the reasoning process.

# Direct prompting (baseline)
prompt = "What is 17 * 23?"
# Model likely fails or gives wrong answer

# CoT prompting
prompt = """
What is 17 * 23?
Think step by step.
"""
# Model shows work: 17 * 20 = 340, 17 * 3 = 51, total = 391

From Prompting to Internalized Behavior

R1 was trained to internalize CoT behavior through RL. Rather than relying on prompting to trigger step-by-step reasoning, the model has learned to do this autonomously. When you send a complex problem to R1, it generates reasoning tokens without being explicitly asked to "think step by step."

This has practical implications:

  • Short prompts work; you don't need elaborate CoT scaffolding
  • Excessive prompting can interfere with native reasoning
  • You can still guide reasoning direction through prompt structure

Verifying Reasoning Quality

Because R1 exposes its reasoning chains, you can verify correctness before accepting outputs. This is valuable for high-stakes applications where wrong answers have real costs.

def verify_reasoning_chain(chain, problem_type):
    """Check reasoning chain for common failure modes"""
    issues = []
    
    # Check for assertion without verification
    if re.search(r"Therefore,.*obviously", chain):
        issues.append("Skipped verification step")
    
    # Check for arithmetic errors (if problem involves math)
    if problem_type == "math":
        arithmetic_steps = extract_math_expressions(chain)
        for step in arithmetic_steps:
            if not verify_arithmetic(step):
                issues.append(f"Arithmetic error in: {step}")
    
    # Check for self-contradiction
    if has_contradiction(chain):
        issues.append("Reasoning chain self-contradicts")
    
    return issues
EXERCISE

Process ten complex queries through R1 and manually inspect the reasoning chains. Categorize the failure modes you observe. Are there patterns that suggest specific prompting adjustments?

← Chapter 3
Inference-Time Compute Scaling
Chapter 5 →
R1 Prompting Quirks