What this does

Compares throughput, memory usage, and response quality between a DeepSeek MoE model and a dense model with comparable active-parameter count. MoE activations are ~37B, so Llama-3-70B is the natural dense counterpart.

Steps

Run the DeepSeek MoE model in one terminal.
```
ollama run deepseek-r1:14b
```
Run the dense model in a second terminal.
```
ollama run llama3:70b
```

Execute a standardized benchmark script.

import time, requests, statistics

def benchmark(model, prompt, runs=5):
    latencies = []
    for _ in range(runs):
        start = time.perf_counter()
        requests.post("http://localhost:11434/api/generate",
            json={"model": model, "prompt": prompt, "stream": False})
        latencies.append(time.perf_counter() - start)
    return statistics.mean(latencies), statistics.stdev(latencies)

prompts = ["Write a Python quicksort", "Explain quantum entanglement", "Summarize the history of Rome"]
for model in ["deepseek-r1:14b", "llama3:70b"]:
    for p in prompts:
        mean, std = benchmark(model, p)
        print(f"{model} | {p[:30]}... | {mean:.2f}s ± {std:.2f}s")

Measure peak memory for each.

ollama ps
nvidia-smi --query-gpu=memory.used --format=csv

Verification

# Expected: DeepSeek uses less memory (~30 GB vs ~45 GB for dense) with comparable latency
ollama ps
nvidia-smi

Common failures

Unfair comparison: Ensure both use the same quantization level (q4_k_m) and context length.
Prompt cache interference: Run warm-up prompts before timing to avoid cold-start skew.
Memory swap on MoE: If VRAM is insufficient, MoE models degrade more gracefully than dense models due to smaller activations.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to benchmark DeepSeek against dense models of similar size

What this does

Steps

Verification

Common failures

Operator checkpoint

Operator checkpoint

Related guides