RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /DeepSeek R1 and Reasoning Models
  6. /Ch. 5
DeepSeek R1 and Reasoning Models

05. R1 Prompting Quirks

Chapter 5 of 18 · 20 min
KEY INSIGHT

R1's RL training creates behaviors optimized for reward signals, not necessarily for user satisfaction. Prompt engineering for R1 is about directing the learned reasoning behavior, not imposing it.

R1's RL training created a model that behaves differently from standard instruction-following models. Effective prompting requires understanding these quirks rather than applying generic instruction-following techniques.

Verbosity Management

R1 tends toward verbose responses. The RL training rewarded thoroughness, which often manifests as overly detailed explanations for simple queries. This isn't a bug—it's the model's learned strategy for maximizing reward signals.

To control verbosity, be specific about output format expectations:

# Verbose output (default behavior)
prompt = "Explain why the sky is blue"
# R1 produces 500-word explanation with scattering physics

# Concise output (desired behavior)
prompt = """
Explain why the sky is blue in 2-3 sentences. 
Use plain language. No examples or tangential information.
"""
# R1 produces concise explanation

Prefix Injection and Reasoning Direction

R1 is sensitive to prompt prefixes that hint at expected reasoning direction. If you begin a prompt with "Here's my reasoning:" and provide flawed logic, R1 often continues from your flawed starting point rather than correcting it.

# Problematic: leading R1 down wrong path
prompt = """
My reasoning: 17 is prime so we should...
[R1 continues from here and often locks in error]

# Better: provide problem without embedded reasoning
prompt = """
Problem: Find all prime factors of 17.
Verify your work by checking divisibility up to sqrt(n).
"""
# R1 generates independent reasoning chain

The XML Structure Preference

R1 often produces responses in XML-like structures even without explicit instruction. The model seems to have learned that XML formatting improves readability and follows patterns seen in training data.

You can exploit this for parsing:

prompt = """
<problem>Find the largest prime factor of 60</problem>
<output_format>
<step number="1">...</step>
<step number="2">...</step>
<final_answer>...</final_answer>
</output_format>
"""
# R1 responds in structured format, easier to parse

Temperature Sensitivity

R1's reasoning quality is more sensitive to temperature than standard models. Low temperature (0.1-0.3) works well for deterministic reasoning tasks. Higher temperatures produce more diverse reasoning chains but can introduce errors in the thinking process itself.

EXERCISE

Take a complex reasoning task (multi-step math, logical deduction, code debugging) and run it with three prompting variants: (1) minimal instruction, (2) verbose guidance, (3) structured format. Compare outputs for quality and efficiency.

← Chapter 4
Chain-of-Thought in Reasoning
Chapter 6 →
Hardware Requirements