Fits comfortably
Running DeepSeek Coder V2 Lite (16B) on NVIDIA GeForce RTX 3080 16GB (Mobile)
NVIDIA GeForce RTX 3080 16GB (Mobile) runs DeepSeek Coder V2 Lite (16B) comfortably at Q4_K_M with 4 GB of headroom for context.
Model size
16B params
DeepSeek Coder V2 Lite (16B) →Memory available
Recommended quant
Q4_K_M
Highest quality that fits
Quick start with Ollama
1. Install
ollama pull deepseek-coder-v2:16b2. Run
ollama run deepseek-coder-v2:16bDefault quant in Ollama is Q4_K_M. To use a different quant, append it: deepseek-coder-v2:16b-q5_K_M.
Variants and what fits
| Quantization | File size | VRAM required | Fits on NVIDIA GeForce RTX 3080 16GB (Mobile)? |
|---|---|---|---|
| Q4_K_M | 9.5 GB | 12 GB | Yes |
Real benchmarks
Frequently asked
Can NVIDIA GeForce RTX 3080 16GB (Mobile) run DeepSeek Coder V2 Lite (16B)?
NVIDIA GeForce RTX 3080 16GB (Mobile) runs DeepSeek Coder V2 Lite (16B) comfortably at Q4_K_M with 4 GB of headroom for context.
What quantization should I use?
Q4_K_M is the highest-quality variant of DeepSeek Coder V2 Lite (16B) that fits in 16 GB VRAM. Lower-bit quants will be smaller but lose some quality.
How fast will it be?
Measured at 152.0 tok/s on this combination in our testing.
See also: DeepSeek Coder V2 Lite (16B), NVIDIA GeForce RTX 3080 16GB (Mobile), all benchmarks.
Reviewed by RunLocalAI Editorial. See our editorial policy.