minicpm
4B parameters
Commercial OK
Reviewed June 2026

MiniCPM 3 4B

OpenBMB's edge-optimized 4B. MIT license; designed for phone deployment. Strong reasoning per parameter.

License: MIT·Released Sep 12, 2024·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

MiniCPM 3 4B is a dense 4-billion-parameter language model released by OpenBMB under the permissive MIT license. Designed specifically for edge deployment, it targets phone and embedded inference with a 32,768-token context window. Its claim to distinction in the open-weight landscape is strong reasoning per parameter, optimized for resource-constrained environments.

Strengths

  • MIT license for unrestricted use: The MIT license permits commercial deployment, modification, and redistribution with minimal restrictions, making it ideal for proprietary applications.
  • Edge-optimized architecture: At 4B parameters, the model is purpose-built for phone and embedded hardware, enabling local inference without cloud dependency.
  • Compact quantized sizes: Q4_K_M at ~2.3 GB and Q3_K_M at ~1.9 GB allow the model to fit within the memory constraints of many mobile devices, even with KV cache overhead.
  • Long context for its class: A 32K context window is generous for a 4B model, supporting document analysis and extended conversations on device.

Limitations

  • Limited community benchmarks: We do not yet have independently verified benchmark scores for this model. Operators should treat published vendor metrics as best-case until third-party validation appears.
  • Small parameter count limits complex reasoning: While optimized for its size, a 4B dense model cannot match the depth of larger models on tasks requiring extensive world knowledge or multi-step logic.
  • No MoE efficiency: As a dense model, all 4B parameters are active per forward pass, unlike mixture-of-experts architectures that can offer lower effective compute per token.
  • Edge hardware variability: Performance on phones depends heavily on chipset, RAM, and software stack; results may vary significantly across devices.

What it takes to run this locally

At FP16, the model requires ~8 GB of disk space. Quantized versions reduce this significantly: Q8_0 ~4 GB, Q6_K ~3.3 GB, Q5_K_M ~2.9 GB, Q4_K_M ~2.3 GB, Q3_K_M ~1.9 GB, Q2_K ~1.3 GB. Add ~30-50% for KV cache and framework overhead at typical context lengths. This places the model firmly in the edge deployment class: it can run on modern smartphones with 6-8 GB RAM using Q4_K_M or lower quantizations, and on embedded devices with sufficient memory.

Should you run this locally?

Yes if you need a permissively licensed, edge-deployable model for on-device inference on phones or embedded systems, and your tasks fit within the capabilities of a 4B dense model. No if your use case demands the reasoning depth of larger models (e.g., 7B+ dense or MoE) or if you require validated benchmark performance before adoption.

Catalog cross-links

  • OpenBMB
  • Edge deployment
  • MIT license

Overview

OpenBMB's edge-optimized 4B. MIT license; designed for phone deployment. Strong reasoning per parameter.

Strengths

  • MIT license
  • Phone-deployable
  • Strong reasoning per param

Weaknesses

  • 4B ceiling limits open-ended generation depth

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M2.4 GB4 GB

Get the model

HuggingFace

Original weights

huggingface.co/openbmb/MiniCPM3-4B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of MiniCPM 3 4B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run MiniCPM 3 4B?

4GB of VRAM is enough to run MiniCPM 3 4B at the Q4_K_M quantization (file size 2.4 GB). Higher-quality quantizations need more.

Can I use MiniCPM 3 4B commercially?

Yes — MiniCPM 3 4B ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of MiniCPM 3 4B?

MiniCPM 3 4B supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/openbmb/MiniCPM3-4B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify MiniCPM 3 4B runs on your specific hardware before committing money.