Phi-4 Reasoning Mini 4B
Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.
Positioning
Phi-4 Reasoning Mini 4B is a dense 3.8B-parameter model from Microsoft, released under the permissive MIT license. It is designed for edge-tier deployment with a focus on reasoning tasks, featuring a 131,072-token context window. This model distinguishes itself by emitting reasoning tokens, making it suitable for applications where logical inference is critical but hardware is constrained.
Strengths
- Compact size with reasoning capability: At 3.8B parameters, it fits comfortably on edge devices while offering reasoning-token emission, a feature typically found in larger models.
- Permissive MIT license: The MIT license allows unrestricted commercial use, modification, and redistribution, making it ideal for proprietary deployments.
- Large context window: 131,072 tokens of context enable processing of long documents or multi-turn conversations without truncation.
- Efficient quantized variants: Quantized versions (e.g., Q4_K_M at ~2.1 GB) allow deployment on devices with limited memory, such as phones or single-GPU setups.
Limitations
- Small parameter count: At 3.8B, it may struggle with complex reasoning or knowledge-intensive tasks compared to larger models. We do not have benchmark scores to quantify this gap.
- No community benchmarks available: We lack independent measurements of performance on standard reasoning benchmarks. Vendor-reported metrics should be treated as best-case.
- KV cache overhead: With a 131K context window, the KV cache can be substantial (estimated 30-50% of model size), potentially limiting effective context length on memory-constrained hardware.
- Edge focus may limit versatility: Optimized for reasoning, it may underperform on creative or open-ended generation tasks relative to general-purpose models of similar size.
What it takes to run this locally
At FP16, the model requires ~8 GB of disk space. Quantized versions reduce this: Q8_0 ~4 GB, Q6_K ~3.1 GB, Q5_K_M ~2.7 GB, Q4_K_M ~2.1 GB, Q3_K_M ~1.9 GB, Q2_K ~1.2 GB. Add 30-50% for KV cache and framework overhead at typical context lengths. This model is suited for edge deployment (single consumer GPU with 4-8 GB VRAM, or CPU with sufficient RAM). No specific tokens-per-second measurements are available.
Should you run this locally?
Yes if you need a reasoning-focused model with a permissive license for edge deployment, and your hardware can accommodate the model size plus KV cache overhead. No if your task requires broad knowledge or high throughput, or if you cannot tolerate the memory overhead of a large context window.
Catalog cross-links
- Phi-4 family overview
- Edge deployment guide
- Quantization tools
Overview
Phi-4 reasoning at the edge tier. 3.8B with reasoning-token emission. The right pick when reasoning matters AND edge deployment is required.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Reasoning at edge tier
- MIT license
Weaknesses
- 3.8B ceiling on reasoning depth
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 2.4 GB | 4 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Phi-4 Reasoning Mini 4B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Phi-4 Reasoning Mini 4B?
Can I use Phi-4 Reasoning Mini 4B commercially?
What's the context length of Phi-4 Reasoning Mini 4B?
Source: huggingface.co/microsoft/Phi-4-reasoning-mini-4B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Phi-4 Reasoning Mini 4B runs on your specific hardware before committing money.