Fits comfortably
Running DeepSeek R1 Distill Qwen 7B on NVIDIA GeForce RTX 3080 16GB (Mobile)
NVIDIA GeForce RTX 3080 16GB (Mobile) runs DeepSeek R1 Distill Qwen 7B comfortably at Q8_0 with 6 GB of headroom for context.
Model size
7B params
DeepSeek R1 Distill Qwen 7B →Memory available
Recommended quant
Q8_0
Highest quality that fits
Quick start with Ollama
1. Install
ollama pull deepseek-r1:7b2. Run
ollama run deepseek-r1:7bDefault quant in Ollama is Q4_K_M. To use a different quant, append it: deepseek-r1:7b-q5_K_M.
Variants and what fits
| Quantization | File size | VRAM required | Fits on NVIDIA GeForce RTX 3080 16GB (Mobile)? |
|---|---|---|---|
| Q4_K_M | 4.7 GB | 6 GB | Yes |
| Q8_0 | 8.1 GB | 10 GB | Yes |
Real benchmarks
Frequently asked
Can NVIDIA GeForce RTX 3080 16GB (Mobile) run DeepSeek R1 Distill Qwen 7B?
NVIDIA GeForce RTX 3080 16GB (Mobile) runs DeepSeek R1 Distill Qwen 7B comfortably at Q8_0 with 6 GB of headroom for context.
What quantization should I use?
Q8_0 is the highest-quality variant of DeepSeek R1 Distill Qwen 7B that fits in 16 GB VRAM. Lower-bit quants will be smaller but lose some quality.
How fast will it be?
Measured at 80.3 tok/s on this combination in our testing.
See also: DeepSeek R1 Distill Qwen 7B, NVIDIA GeForce RTX 3080 16GB (Mobile), all benchmarks.
Reviewed by RunLocalAI Editorial. See our editorial policy.