phi
3.8B parameters
Commercial OK
Reviewed June 2026

Phi-4 Mini 4B

Microsoft's edge-tier Phi-4 variant. 3.8B params; designed for phone / tablet / Pi deployment. Strong reasoning per parameter — Phi family's traditional advantage carries to the smallest tier.

License: MIT·Released Feb 25, 2026·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Microsoft's Phi-4 Mini 4B is a compact 3.8B-parameter dense model released under the permissive MIT license. Designed for edge deployment, it targets phones, tablets, and single-board computers like the Raspberry Pi. The Phi family is known for strong reasoning per parameter, and this smallest variant extends that advantage to resource-constrained environments. With a 131K context window, it offers long-context capability unusual for its size class.

Strengths

  • Edge-optimized size: At 3.8B parameters, the model fits comfortably on consumer hardware. Quantized to Q4_K_M (2.1 GB on disk) or Q3_K_M (1.9 GB), it can run on devices with as little as 4 GB of RAM after accounting for KV cache overhead.
  • Permissive MIT license: No restrictions on commercial use, modification, or redistribution — ideal for embedded products and proprietary applications.
  • Long context window: 131K tokens of context length is exceptional for a sub-4B model, enabling processing of lengthy documents or multi-turn conversations on edge devices.
  • Proven architecture lineage: The Phi family's emphasis on data quality and reasoning efficiency carries through to this tier, offering reliable performance for its parameter count.

Limitations

  • Small parameter count limits complex reasoning: While efficient, 3.8B parameters cannot match the depth of larger models on tasks requiring extensive world knowledge or multi-step logic.
  • No community benchmarks available: We do not yet have independent measurements for this model. Operators should treat published vendor metrics as best-case and verify against their own workloads.
  • Edge deployment constraints: Running the full 131K context window at FP16 (~8 GB) may exceed the memory of typical edge devices; quantization is essential for practical use.
  • Niche best-use case: The model excels at embedded reasoning but is not designed for general-purpose chatbot or content generation roles where larger models are preferable.

What it takes to run this locally

Quantized sizes range from 8 GB (FP16) down to ~1.2 GB (Q2_K). For typical edge hardware (4–8 GB RAM), Q4_K_M (2.1 GB) or Q3_K_M (~1.9 GB) are practical, with an additional 30–50% overhead for KV cache and framework. Deployment class is edge: single low-power device (phone, tablet, Pi). No specific tok/s measurements are available.

Should you run this locally?

Yes if you need a permissively licensed, small-footprint model for on-device reasoning with long-context support, and your hardware can accommodate the quantized size plus overhead.

No if your task demands deep reasoning or broad knowledge that only larger models can provide, or if you cannot tolerate the memory constraints of edge deployment.

Catalog cross-links

Overview

Microsoft's edge-tier Phi-4 variant. 3.8B params; designed for phone / tablet / Pi deployment. Strong reasoning per parameter — Phi family's traditional advantage carries to the smallest tier.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
Phi-4 14B14B
Consumer

Strengths

  • Edge-tier deployment (phone, Pi)
  • MIT license
  • Strong reasoning per parameter

Weaknesses

  • Small parameter count limits open-ended generation

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M2.4 GB4 GB

Get the model

HuggingFace

Original weights

huggingface.co/microsoft/Phi-4-mini

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Phi-4 Mini 4B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Phi-4 Mini 4B?

4GB of VRAM is enough to run Phi-4 Mini 4B at the Q4_K_M quantization (file size 2.4 GB). Higher-quality quantizations need more.

Can I use Phi-4 Mini 4B commercially?

Yes — Phi-4 Mini 4B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Phi-4 Mini 4B?

Phi-4 Mini 4B supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/microsoft/Phi-4-mini

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Phi-4 Mini 4B runs on your specific hardware before committing money.