R1 Prompting Quirks — DeepSeek R1 and Reasoning Models (Chapter 5)

R1's RL training created a model that behaves differently from standard instruction-following models. Effective prompting requires understanding these quirks rather than applying generic instruction-following techniques.

Verbosity Management

R1 tends toward verbose responses. The RL training rewarded thoroughness, which often manifests as overly detailed explanations for simple queries. This isn't a bug—it's the model's learned strategy for maximizing reward signals.

To control verbosity, be specific about output format expectations:

# Verbose output (default behavior)
prompt = "Explain why the sky is blue"
# R1 produces 500-word explanation with scattering physics

# Concise output (desired behavior)
prompt = """
Explain why the sky is blue in 2-3 sentences. 
Use plain language. No examples or tangential information.
"""
# R1 produces concise explanation

Prefix Injection and Reasoning Direction

R1 is sensitive to prompt prefixes that hint at expected reasoning direction. If you begin a prompt with "Here's my reasoning:" and provide flawed logic, R1 often continues from your flawed starting point rather than correcting it.

# Problematic: leading R1 down wrong path
prompt = """
My reasoning: 17 is prime so we should...
[R1 continues from here and often locks in error]

# Better: provide problem without embedded reasoning
prompt = """
Problem: Find all prime factors of 17.
Verify your work by checking divisibility up to sqrt(n).
"""
# R1 generates independent reasoning chain

The XML Structure Preference

R1 often produces responses in XML-like structures even without explicit instruction. The model seems to have learned that XML formatting improves readability and follows patterns seen in training data.

You can exploit this for parsing:

prompt = """
<problem>Find the largest prime factor of 60</problem>
<output_format>
<step number="1">...</step>
<step number="2">...</step>
<final_answer>...</final_answer>
</output_format>
"""
# R1 responds in structured format, easier to parse

Temperature Sensitivity

R1's reasoning quality is more sensitive to temperature than standard models. Low temperature (0.1-0.3) works well for deterministic reasoning tasks. Higher temperatures produce more diverse reasoning chains but can introduce errors in the thinking process itself.