Phi-4 Reasoning 14B

Positioning

The reasoning specialist of the Phi-4 line — same architecture, tuned with chain-of-thought training. Built for math, code planning, and multi-step problem decomposition in a 14B body that fits in 12 GB VRAM.

Strengths

Reasoning-class quality at 14B — competitive with QwQ 32B on math while using half the VRAM.
MIT license — the same license clarity as base Phi-4.
Visible chain-of-thought is well-formatted and useful for verification.

Limitations

Always reasons — no toggle. Every prompt eats 2–3× the tokens.
Verbose intermediate output dominates throughput on simple questions.
Same narrow knowledge breadth as Phi-4 base.

Real-world performance on RTX 4090

Q4_K_M (8.4 GB): 70–85 tok/s decode — but 2–3× tokens per answer
Q5_K_M (9.9 GB): 60–75 tok/s
Q8_0 (14.7 GB): 42–52 tok/s

Should you run this locally?

Yes, for dedicated math / code-planning workflows where reasoning quality matters and you want to fit in 12 GB VRAM. No, for general chat — base Phi-4 14B is simpler. For maximum reasoning, jump to QwQ 32B if VRAM allows.

How it compares

vs Phi-4 14B (base) → Reasoning variant wins on hard problems; base wins on throughput for simple ones. Pick by workload.
vs QwQ 32B → QwQ has stronger absolute reasoning; Phi-4 Reasoning fits in much less VRAM.
vs Qwen 3 14B with thinking mode → Qwen 3 has the toggle flexibility; Phi-4 Reasoning has slightly cleaner reasoning traces.
vs DeepSeek R1 Distill Qwen 14B → R1 Distill is more aggressive on reasoning depth; Phi-4 Reasoning is steadier on consistency.

Run this yourself

ollama pull phi4-reasoning:14b-q4_K_M
ollama run phi4-reasoning:14b-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4060 Ti 16 GB / 4090

Quantization	File size	VRAM required
Q4_K_M	8.4 GB	11 GB

Quantization

File size

VRAM required

Q4_K_M

8.4 GB

11 GB

Frequently asked

What's the minimum VRAM to run Phi-4 Reasoning 14B?

11GB of VRAM is enough to run Phi-4 Reasoning 14B at the Q4_K_M quantization (file size 8.4 GB). Higher-quality quantizations need more.

Can I use Phi-4 Reasoning 14B commercially?

Yes — Phi-4 Reasoning 14B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Phi-4 Reasoning 14B?

Phi-4 Reasoning 14B supports a context window of 32,768 tokens (about 33K).

How do I install Phi-4 Reasoning 14B with Ollama?

Run `ollama pull phi4-reasoning:14b` to download, then `ollama run phi4-reasoning:14b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Phi-4 Reasoning 14B?

Can I use Phi-4 Reasoning 14B commercially?

What's the context length of Phi-4 Reasoning 14B?

How do I install Phi-4 Reasoning 14B with Ollama?