phi
3.8B parameters
Commercial OK
Reviewed June 2026

Phi-4 Reasoning Mini 4B

Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.

License: MIT·Released Apr 8, 2026·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Phi-4 Reasoning Mini 4B is a dense 3.8B-parameter model from Microsoft, released under the permissive MIT license. It is designed for edge-tier deployment with a focus on reasoning tasks, featuring a 131,072-token context window. This model distinguishes itself by emitting reasoning tokens, making it suitable for applications where logical inference is critical but hardware is constrained.

Strengths

  • Compact size with reasoning capability: At 3.8B parameters, it fits comfortably on edge devices while offering reasoning-token emission, a feature typically found in larger models.
  • Permissive MIT license: The MIT license allows unrestricted commercial use, modification, and redistribution, making it ideal for proprietary deployments.
  • Large context window: 131,072 tokens of context enable processing of long documents or multi-turn conversations without truncation.
  • Efficient quantized variants: Quantized versions (e.g., Q4_K_M at ~2.1 GB) allow deployment on devices with limited memory, such as phones or single-GPU setups.

Limitations

  • Small parameter count: At 3.8B, it may struggle with complex reasoning or knowledge-intensive tasks compared to larger models. We do not have benchmark scores to quantify this gap.
  • No community benchmarks available: We lack independent measurements of performance on standard reasoning benchmarks. Vendor-reported metrics should be treated as best-case.
  • KV cache overhead: With a 131K context window, the KV cache can be substantial (estimated 30-50% of model size), potentially limiting effective context length on memory-constrained hardware.
  • Edge focus may limit versatility: Optimized for reasoning, it may underperform on creative or open-ended generation tasks relative to general-purpose models of similar size.

What it takes to run this locally

At FP16, the model requires ~8 GB of disk space. Quantized versions reduce this: Q8_0 ~4 GB, Q6_K ~3.1 GB, Q5_K_M ~2.7 GB, Q4_K_M ~2.1 GB, Q3_K_M ~1.9 GB, Q2_K ~1.2 GB. Add 30-50% for KV cache and framework overhead at typical context lengths. This model is suited for edge deployment (single consumer GPU with 4-8 GB VRAM, or CPU with sufficient RAM). No specific tokens-per-second measurements are available.

Should you run this locally?

Yes if you need a reasoning-focused model with a permissive license for edge deployment, and your hardware can accommodate the model size plus KV cache overhead. No if your task requires broad knowledge or high throughput, or if you cannot tolerate the memory overhead of a large context window.

Catalog cross-links

Overview

Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Strengths

  • Reasoning at edge tier
  • MIT license

Weaknesses

  • 3.8B ceiling on reasoning depth

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M2.4 GB4 GB

Get the model

HuggingFace

Original weights

huggingface.co/microsoft/Phi-4-reasoning-mini-4B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Phi-4 Reasoning Mini 4B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Phi-4 Reasoning Mini 4B?

4GB of VRAM is enough to run Phi-4 Reasoning Mini 4B at the Q4_K_M quantization (file size 2.4 GB). Higher-quality quantizations need more.

Can I use Phi-4 Reasoning Mini 4B commercially?

Yes — Phi-4 Reasoning Mini 4B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Phi-4 Reasoning Mini 4B?

Phi-4 Reasoning Mini 4B supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/microsoft/Phi-4-reasoning-mini-4B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Phi-4 Reasoning Mini 4B runs on your specific hardware before committing money.