Chain-of-Thought in Reasoning — DeepSeek R1 and Reasoning Models (Chapter 4)

Chain-of-thought prompting was the first technique to show that language models could reason step-by-step when asked. Reasoning models like R1 have internalized this capability, but operators still influence how well it manifests. This chapter covers the mechanics of CoT and how to work with R1's native reasoning.

The Discovery of CoT

Standard prompting asks models to produce answers directly. Chain-of-thought prompting asks models to "think step by step" first. The technique emerged empirically—researchers noticed that forcing intermediate steps improved accuracy on math and logic problems. The hypothesis: explicit steps reduce the burden on working memory by externalizing the reasoning process.

# Direct prompting (baseline)
prompt = "What is 17 * 23?"
# Model likely fails or gives wrong answer

# CoT prompting
prompt = """
What is 17 * 23?
Think step by step.
"""
# Model shows work: 17 * 20 = 340, 17 * 3 = 51, total = 391

From Prompting to Internalized Behavior

R1 was trained to internalize CoT behavior through RL. Rather than relying on prompting to trigger step-by-step reasoning, the model has learned to do this autonomously. When you send a complex problem to R1, it generates reasoning tokens without being explicitly asked to "think step by step."

This has practical implications:

Short prompts work; you don't need elaborate CoT scaffolding
Excessive prompting can interfere with native reasoning
You can still guide reasoning direction through prompt structure

Verifying Reasoning Quality

Because R1 exposes its reasoning chains, you can verify correctness before accepting outputs. This is valuable for high-stakes applications where wrong answers have real costs.

def verify_reasoning_chain(chain, problem_type):
    """Check reasoning chain for common failure modes"""
    issues = []
    
    # Check for assertion without verification
    if re.search(r"Therefore,.*obviously", chain):
        issues.append("Skipped verification step")
    
    # Check for arithmetic errors (if problem involves math)
    if problem_type == "math":
        arithmetic_steps = extract_math_expressions(chain)
        for step in arithmetic_steps:
            if not verify_arithmetic(step):
                issues.append(f"Arithmetic error in: {step}")
    
    # Check for self-contradiction
    if has_contradiction(chain):
        issues.append("Reasoning chain self-contradicts")
    
    return issues