Phi-4 Reasoning 14B
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
Positioning
The reasoning specialist of the Phi-4 line — same architecture, tuned with chain-of-thought training. Built for math, code planning, and multi-step problem decomposition in a 14B body that fits in 12 GB VRAM.
Strengths
- Reasoning-class quality at 14B — competitive with QwQ 32B on math while using half the VRAM.
- MIT license — the same license clarity as base Phi-4.
- Visible chain-of-thought is well-formatted and useful for verification.
Limitations
- Always reasons — no toggle. Every prompt eats 2–3× the tokens.
- Verbose intermediate output dominates throughput on simple questions.
- Same narrow knowledge breadth as Phi-4 base.
Real-world performance on RTX 4090
- Q4_K_M (8.4 GB): 70–85 tok/s decode — but 2–3× tokens per answer
- Q5_K_M (9.9 GB): 60–75 tok/s
- Q8_0 (14.7 GB): 42–52 tok/s
Should you run this locally?
Yes, for dedicated math / code-planning workflows where reasoning quality matters and you want to fit in 12 GB VRAM. No, for general chat — base Phi-4 14B is simpler. For maximum reasoning, jump to QwQ 32B if VRAM allows.
How it compares
- vs Phi-4 14B (base) → Reasoning variant wins on hard problems; base wins on throughput for simple ones. Pick by workload.
- vs QwQ 32B → QwQ has stronger absolute reasoning; Phi-4 Reasoning fits in much less VRAM.
- vs Qwen 3 14B with thinking mode → Qwen 3 has the toggle flexibility; Phi-4 Reasoning has slightly cleaner reasoning traces.
- vs DeepSeek R1 Distill Qwen 14B → R1 Distill is more aggressive on reasoning depth; Phi-4 Reasoning is steadier on consistency.
Run this yourself
ollama pull phi4-reasoning:14b-q4_K_M
ollama run phi4-reasoning:14b-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4060 Ti 16 GB / 4090
›Why this rating
8.5/10 — Phi-4 with explicit reasoning training. Closes the gap with QwQ 32B at half the VRAM, but always-on chain-of-thought eats throughput on simple prompts. Loses points to base Phi-4 14B for general use.
Overview
Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Best 14B reasoner at release
- MIT license
Weaknesses
- Verbose by default
Prompting kit
Tested patterns for getting the most out of Phi-4 Reasoning 14B locally. Local models are pickier about prompt structure than cloud models — what works on Claude or GPT-5 often fails here.
Recommended system prompt
You are a careful reasoning assistant. For any non-trivial question, think step by step and show your reasoning before giving the final answer. Use <think> tags around your reasoning, then give the final answer.
Quirks to know
- •Fine-tuned variant of Phi-4 14B for explicit reasoning. Per the model card, emits <think>...</think> reasoning before answering — similar in spirit to DeepSeek R1 but in a smaller package.
- •Per Microsoft's technical report, Phi-4-Reasoning matches DeepSeek R1-Distill-Llama-70B on math benchmarks despite being 5× smaller. Strong choice for math/code reasoning on 24GB rigs.
- •Inherits Phi-4's 16K context window — short by current standards. Don't use for very long documents.
- •Inherits Phi-4's heavy refusals on coding security topics. Microsoft has not loosened these in the reasoning variant.
- •Uses ChatML chat template like base Phi-4.
Chat template
Standard ChatML with <|im_start|>{role}\n{content}<|im_end|>. Same as base Phi-4 — ships in tokenizer_config.json.
Tool calling
No tool-calling tuning in the reasoning variant. For function calling, use Mistral Small 3.2 or Llama 3.3 instead.
Sampler settings
- temperature
- 0.7
- top_p
- 0.95
Microsoft's evaluation harness sampler defaults. For tight reasoning where you want minimal variance, drop temperature to 0.1-0.3.
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 8.4 GB | 11 GB |
Get the model
Ollama
One-line install
ollama run phi4-reasoning:14bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Benchmarks
Real measurements on real hardware. Numbers ship with the runner version, quant, and date.
| Hardware | Provenance | Quant | Ctx | Tokens / sec | TTFT | Date |
|---|---|---|---|---|---|---|
| NVIDIA GeForce RTX 3080 16GB (Mobile) | EditorialM | Q4_K_M | 4K | 40.4tok/s | 226 ms | Jun 2, 26 |
What to do next
Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Phi-4 Reasoning 14B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Phi-4 Reasoning 14B?
Can I use Phi-4 Reasoning 14B commercially?
What's the context length of Phi-4 Reasoning 14B?
How do I install Phi-4 Reasoning 14B with Ollama?
Source: huggingface.co/microsoft/phi-4-reasoning
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Phi-4 Reasoning 14B runs on your specific hardware before committing money.