phi
14B parameters
Commercial OK

Phi-4 Reasoning 14B

Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.

License: MIT·Released Apr 30, 2025·Context: 32,768 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
8.5/10
Positioning

The reasoning specialist of the Phi-4 line — same architecture, tuned with chain-of-thought training. Built for math, code planning, and multi-step problem decomposition in a 14B body that fits in 12 GB VRAM.

Strengths
  • Reasoning-class quality at 14B — competitive with QwQ 32B on math while using half the VRAM.
  • MIT license — the same license clarity as base Phi-4.
  • Visible chain-of-thought is well-formatted and useful for verification.
Limitations
  • Always reasons — no toggle. Every prompt eats 2–3× the tokens.
  • Verbose intermediate output dominates throughput on simple questions.
  • Same narrow knowledge breadth as Phi-4 base.
Real-world performance on RTX 4090
  • Q4_K_M (8.4 GB): 70–85 tok/s decode — but 2–3× tokens per answer
  • Q5_K_M (9.9 GB): 60–75 tok/s
  • Q8_0 (14.7 GB): 42–52 tok/s
Should you run this locally?

Yes, for dedicated math / code-planning workflows where reasoning quality matters and you want to fit in 12 GB VRAM. No, for general chat — base Phi-4 14B is simpler. For maximum reasoning, jump to QwQ 32B if VRAM allows.

How it compares
  • vs Phi-4 14B (base) → Reasoning variant wins on hard problems; base wins on throughput for simple ones. Pick by workload.
  • vs QwQ 32B → QwQ has stronger absolute reasoning; Phi-4 Reasoning fits in much less VRAM.
  • vs Qwen 3 14B with thinking mode → Qwen 3 has the toggle flexibility; Phi-4 Reasoning has slightly cleaner reasoning traces.
  • vs DeepSeek R1 Distill Qwen 14B → R1 Distill is more aggressive on reasoning depth; Phi-4 Reasoning is steadier on consistency.
Run this yourself
ollama pull phi4-reasoning:14b-q4_K_M
ollama run phi4-reasoning:14b-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4060 Ti 16 GB / 4090
Why this rating

8.5/10 — Phi-4 with explicit reasoning training. Closes the gap with QwQ 32B at half the VRAM, but always-on chain-of-thought eats throughput on simple prompts. Loses points to base Phi-4 14B for general use.

Overview

Reasoning-focused fine-tune of Phi-4. Visible chain-of-thought, competitive with much larger models on math and STEM benchmarks.

Strengths

  • Best 14B reasoner at release
  • MIT license

Weaknesses

  • Verbose by default

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M8.4 GB11 GB

Get the model

Ollama

One-line install

ollama run phi4-reasoning:14bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/microsoft/phi-4-reasoning

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Phi-4 Reasoning 14B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Phi-4 Reasoning 14B?

11GB of VRAM is enough to run Phi-4 Reasoning 14B at the Q4_K_M quantization (file size 8.4 GB). Higher-quality quantizations need more.

Can I use Phi-4 Reasoning 14B commercially?

Yes — Phi-4 Reasoning 14B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Phi-4 Reasoning 14B?

Phi-4 Reasoning 14B supports a context window of 32,768 tokens (about 33K).

How do I install Phi-4 Reasoning 14B with Ollama?

Run `ollama pull phi4-reasoning:14b` to download, then `ollama run phi4-reasoning:14b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/microsoft/phi-4-reasoning

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.