Phi

by Microsoft Research

Microsoft's small-but-strong family — Phi-4, Phi-3.5 lineage. Trained on synthetic data for reasoning. The canonical 'punching above weight class' family — Phi-4 14B competes with 70B-class models on reasoning benchmarks.

Best entry point for local use

Start with Phi-4 14B at Q4_K_M via Ollama — fits on single RTX 3060 12GB at 9 GB VRAM. Phi-4 delivers disproportionate reasoning quality for its size (MMLU ~85%, MATH ~80%) because Microsoft trained it primarily on synthetic reasoning data rather than web-scale general text. This means Phi-4 is exceptional at math, logic, and code reasoning but weaker on world knowledge and creative writing than similarly-sized Llama or Qwen models. For minimum VRAM (<6 GB), use Phi-3.5 Mini 3.8B Q4 (3 GB) — it uses a 32K Llama-compatible vocab and handles basic assistant workloads competently. Skip Phi-3 Vision unless you specifically need on-device vision reasoning — the text models are more robust. Phi models use standard Llama-format chat templates with <|user|> / <|assistant|> / <|end|> tokens.

Deployment guidance

For single-user local: Ollama + phi4:14b Q4_K_M on RTX 3060 12GB or Apple M3 via llama.cpp. Phi-4 uses standard dense transformer with GQA — every Llama-compatible engine works. For Windows-first users: ONNX Runtime with DirectML on AMD Ryzen AI 9 HX 370 NPU — Phi-Silica variant runs at ~20 tok/s with 4-bit NPU offload. For multi-user serving: vLLM 0.6.0+ with AWQ 4-bit on L4 24 GB — the 14B model serves ~500 concurrent requests at ~30 tok/s/user. For mobile: ONNX Runtime Mobile on Snapdragon X Elite — Phi-3.5 Mini runs on-device via Qualcomm AI Engine. Note: Phi-4 is MIT-licensed — no commercial restrictions, no MAU cap, one of the most permissively licensed frontier-quality models.

Featured models

Models in this family with our verdicts

Phi-4 14B

Recommended runtimes

vLLM Ollama llama.cpp

Related families

Gemma Qwen

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Runtimes that fit

Alternatives

Gemma Qwen

Before you buy

Verify Phi runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →