HOW-TO · INF

How to pull and run DeepSeek MoE models efficiently

intermediate15 minBy Fredoline Eruo
PREREQUISITES

Ollama installed, 16GB+ RAM

What this does

DeepSeek MoE models use Mixture-of-Experts architecture, activating only a subset of parameters per token. This guide shows how to pull and run these models with optimal efficiency on consumer hardware.

Steps

  1. Pull a quantized DeepSeek MoE variant. Quantized versions drastically reduce memory requirements.

    ollama pull deepseek-r1:14b
    

    Expected: Download progress, then model registered in local store.

  2. Verify the model pulls correctly.

    ollama list
    

    Expected: deepseek-r1:14b appears with file size ~40 GB.

  3. Run with minimal context to reduce memory pressure.

    ollama run deepseek-r1:14b
    

    Inside the session, set a shorter context:

    /set parameter num_ctx 4096
    
  4. Measure active memory usage.

    ollama ps
    

    Expected: Shows memory consumed by the running model. MoE models can reduce active compute, but total load memory still depends on the packaged weights and runtime.

Verification

ollama ps
# Expected output: deepseek-r1:14b running with memory 24-32 GB (activates ~37B parameters)

Common failures

  • Out of memory during load: Reduce num_ctx to 2048 or use a smaller quantized variant (q3, q2).
  • Model not found: Verify the exact tag name with ollama search deepseek-v3.
  • Slow inference: MoE models benefit from GPU offloading. Set --n-gpu-layers appropriately.
  • Disk space exhausted: Delete unused models with ollama rm <model>.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Related guides

RELATED GUIDES