qwen
4B parameters
Commercial OK
Reviewed June 2026

Qwen 3 4B

Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.

License: Apache 2.0·Released Apr 29, 2025·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Qwen 3 4B is a compact dense model from Alibaba's Qwen family, released under the permissive Apache 2.0 license. With 4 billion parameters and a 131,072-token context window, it is designed for edge deployment and Apple Silicon laptops. The vendor claims it outperforms many 7B models from prior generations, though independent benchmarks are not yet widely available.

Strengths

  • Edge-friendly size: At 4B parameters, the model fits easily on consumer hardware. Even at FP16 it requires only ~8 GB, and quantized versions drop to as low as ~1.3 GB (Q2_K), making it viable for phones, tablets, and laptops.
  • Permissive Apache 2.0 license: No restrictions on commercial use, modification, or redistribution — ideal for integrating into proprietary products.
  • Long 128K context window: Matches the full Qwen 3 family capability, enabling processing of large documents or extended conversations without truncation.
  • Modern architecture: As a dense model from the Qwen 3 lineage, it benefits from architectural improvements over earlier Qwen generations, though specific gains are not independently measured here.

Limitations

  • We lack independent benchmark results: The claim of outperforming prior 7B models comes from the vendor. Operators should treat published metrics as best-case until community validation appears.
  • Small parameter count limits raw capability: While efficient, 4B parameters may struggle with complex reasoning, coding, or knowledge-intensive tasks compared to larger models.
  • Quantization trade-offs: At aggressive quants (Q2_K, Q3_K_M), quality degradation is expected. The listed disk sizes do not account for KV cache overhead, which can add 30–50% at full context.
  • No MoE efficiency: Unlike some Qwen 3 variants, this is a dense model — every token uses all 4B parameters, so inference cost scales linearly with parameter count.

What it takes to run this locally

Quantized sizes range from ~8 GB (FP16) down to ~1.3 GB (Q2_K). Add 30–50% for KV cache and framework overhead at typical context lengths. This model is firmly in the consumer/edge deployment class: it runs on a single 12–24 GB GPU, Apple Silicon (M-series) laptops, or even CPU with sufficient RAM. No specific tok/s figures are available.

Should you run this locally?

Yes if you need a permissively licensed, compact model for on-device or laptop use, especially with long-context requirements. No if you require top-tier reasoning or coding performance — consider a larger Qwen 3 variant or a frontier model via API.

Catalog cross-links

Overview

Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
Qwen 3 8B8B
Consumer

Strengths

  • Edge-class footprint
  • Apache 2.0

Weaknesses

  • Reasoning weaker than 8B+

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M2.5 GB4 GB
Q8_04.4 GB6 GB

Get the model

Ollama

One-line install

ollama run qwen3:4bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/Qwen/Qwen3-4B

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

1 run on record
HardwareProvenanceQuantCtxTokens / secTTFTDate
NVIDIA GeForce RTX 3080 16GB (Mobile)
EditorialM
Q4_K_M4K
103.7tok/s
303 msJun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Qwen 3 4B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Qwen 3 4B?

4GB of VRAM is enough to run Qwen 3 4B at the Q4_K_M quantization (file size 2.5 GB). Higher-quality quantizations need more.

Can I use Qwen 3 4B commercially?

Yes — Qwen 3 4B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Qwen 3 4B?

Qwen 3 4B supports a context window of 131,072 tokens (about 131K).

How do I install Qwen 3 4B with Ollama?

Run `ollama pull qwen3:4b` to download, then `ollama run qwen3:4b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/Qwen/Qwen3-4B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Qwen 3 4B runs on your specific hardware before committing money.