RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to benchmark DeepSeek against dense models of similar size
HOW-TO · INF

How to benchmark DeepSeek against dense models of similar size

advanced·30 min·By Fredoline Eruo
PREREQUISITES

DeepSeek MoE model and comparable dense model downloaded

What this does

Compares throughput, memory usage, and response quality between a DeepSeek MoE model and a dense model with comparable active-parameter count. MoE activations are ~37B, so Llama-3-70B is the natural dense counterpart.

Steps

  1. Run the DeepSeek MoE model in one terminal.

    ollama run deepseek-r1:14b
    
  2. Run the dense model in a second terminal.

    ollama run llama3:70b
    
  3. Execute a standardized benchmark script.

    import time, requests, statistics
    
    def benchmark(model, prompt, runs=5):
        latencies = []
        for _ in range(runs):
            start = time.perf_counter()
            requests.post("http://localhost:11434/api/generate",
                json={"model": model, "prompt": prompt, "stream": False})
            latencies.append(time.perf_counter() - start)
        return statistics.mean(latencies), statistics.stdev(latencies)
    
    prompts = ["Write a Python quicksort", "Explain quantum entanglement", "Summarize the history of Rome"]
    for model in ["deepseek-r1:14b", "llama3:70b"]:
        for p in prompts:
            mean, std = benchmark(model, p)
            print(f"{model} | {p[:30]}... | {mean:.2f}s ± {std:.2f}s")
    
  4. Measure peak memory for each.

    ollama ps
    nvidia-smi --query-gpu=memory.used --format=csv
    

Verification

# Expected: DeepSeek uses less memory (~30 GB vs ~45 GB for dense) with comparable latency
ollama ps
nvidia-smi

Common failures

  • Unfair comparison: Ensure both use the same quantization level (q4_k_m) and context length.
  • Prompt cache interference: Run warm-up prompts before timing to avoid cold-start skew.
  • Memory swap on MoE: If VRAM is insufficient, MoE models degrade more gracefully than dense models due to smaller activations.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Related guides

  • How to pull and run DeepSeek MoE models efficiently
  • How to configure DeepSeek models for reduced memory usage
RELATED GUIDES
INF
How to configure DeepSeek models for reduced memory usage
INF
How to pull and run DeepSeek MoE models efficiently
← All how-to guidesCourses →