Qwen 3 4B
Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.
Positioning
Qwen 3 4B is a compact dense model from Alibaba's Qwen family, released under the permissive Apache 2.0 license. With 4 billion parameters and a 131,072-token context window, it is designed for edge deployment and Apple Silicon laptops. The vendor claims it outperforms many 7B models from prior generations, though independent benchmarks are not yet widely available.
Strengths
- Edge-friendly size: At 4B parameters, the model fits easily on consumer hardware. Even at FP16 it requires only ~8 GB, and quantized versions drop to as low as ~1.3 GB (Q2_K), making it viable for phones, tablets, and laptops.
- Permissive Apache 2.0 license: No restrictions on commercial use, modification, or redistribution — ideal for integrating into proprietary products.
- Long 128K context window: Matches the full Qwen 3 family capability, enabling processing of large documents or extended conversations without truncation.
- Modern architecture: As a dense model from the Qwen 3 lineage, it benefits from architectural improvements over earlier Qwen generations, though specific gains are not independently measured here.
Limitations
- We lack independent benchmark results: The claim of outperforming prior 7B models comes from the vendor. Operators should treat published metrics as best-case until community validation appears.
- Small parameter count limits raw capability: While efficient, 4B parameters may struggle with complex reasoning, coding, or knowledge-intensive tasks compared to larger models.
- Quantization trade-offs: At aggressive quants (Q2_K, Q3_K_M), quality degradation is expected. The listed disk sizes do not account for KV cache overhead, which can add 30–50% at full context.
- No MoE efficiency: Unlike some Qwen 3 variants, this is a dense model — every token uses all 4B parameters, so inference cost scales linearly with parameter count.
What it takes to run this locally
Quantized sizes range from ~8 GB (FP16) down to ~1.3 GB (Q2_K). Add 30–50% for KV cache and framework overhead at typical context lengths. This model is firmly in the consumer/edge deployment class: it runs on a single 12–24 GB GPU, Apple Silicon (M-series) laptops, or even CPU with sufficient RAM. No specific tok/s figures are available.
Should you run this locally?
Yes if you need a permissively licensed, compact model for on-device or laptop use, especially with long-context requirements. No if you require top-tier reasoning or coding performance — consider a larger Qwen 3 variant or a frontier model via API.
Catalog cross-links
- Qwen 3 8B
- Qwen 3 32B
- Apple Silicon
Overview
Compact Qwen 3 for edge and laptop deployment. Outperforms many 7B models from prior generations.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Edge-class footprint
- Apache 2.0
Weaknesses
- Reasoning weaker than 8B+
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 2.5 GB | 4 GB |
| Q8_0 | 4.4 GB | 6 GB |
Get the model
Ollama
One-line install
ollama run qwen3:4bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Benchmarks
Real measurements on real hardware. Numbers ship with the runner version, quant, and date.
| Hardware | Provenance | Quant | Ctx | Tokens / sec | TTFT | Date |
|---|---|---|---|---|---|---|
| NVIDIA GeForce RTX 3080 16GB (Mobile) | EditorialM | Q4_K_M | 4K | 103.7tok/s | 303 ms | Jun 2, 26 |
What to do next
Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Qwen 3 4B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Qwen 3 4B?
Can I use Qwen 3 4B commercially?
What's the context length of Qwen 3 4B?
How do I install Qwen 3 4B with Ollama?
Source: huggingface.co/Qwen/Qwen3-4B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Qwen 3 4B runs on your specific hardware before committing money.