05. R1 Prompting Quirks
R1's RL training created a model that behaves differently from standard instruction-following models. Effective prompting requires understanding these quirks rather than applying generic instruction-following techniques.
Verbosity Management
R1 tends toward verbose responses. The RL training rewarded thoroughness, which often manifests as overly detailed explanations for simple queries. This isn't a bug—it's the model's learned strategy for maximizing reward signals.
To control verbosity, be specific about output format expectations:
# Verbose output (default behavior)
prompt = "Explain why the sky is blue"
# R1 produces 500-word explanation with scattering physics
# Concise output (desired behavior)
prompt = """
Explain why the sky is blue in 2-3 sentences.
Use plain language. No examples or tangential information.
"""
# R1 produces concise explanation
Prefix Injection and Reasoning Direction
R1 is sensitive to prompt prefixes that hint at expected reasoning direction. If you begin a prompt with "Here's my reasoning:" and provide flawed logic, R1 often continues from your flawed starting point rather than correcting it.
# Problematic: leading R1 down wrong path
prompt = """
My reasoning: 17 is prime so we should...
[R1 continues from here and often locks in error]
# Better: provide problem without embedded reasoning
prompt = """
Problem: Find all prime factors of 17.
Verify your work by checking divisibility up to sqrt(n).
"""
# R1 generates independent reasoning chain
The XML Structure Preference
R1 often produces responses in XML-like structures even without explicit instruction. The model seems to have learned that XML formatting improves readability and follows patterns seen in training data.
You can exploit this for parsing:
prompt = """
<problem>Find the largest prime factor of 60</problem>
<output_format>
<step number="1">...</step>
<step number="2">...</step>
<final_answer>...</final_answer>
</output_format>
"""
# R1 responds in structured format, easier to parse
Temperature Sensitivity
R1's reasoning quality is more sensitive to temperature than standard models. Low temperature (0.1-0.3) works well for deterministic reasoning tasks. Higher temperatures produce more diverse reasoning chains but can introduce errors in the thinking process itself.
Take a complex reasoning task (multi-step math, logical deduction, code debugging) and run it with three prompting variants: (1) minimal instruction, (2) verbose guidance, (3) structured format. Compare outputs for quality and efficiency.