What this does

Mixture-of-Experts (MoE) splits a model into multiple "expert" sub-networks with a router that selects which experts activate per token. This guide explains the architecture and shows how to inspect routing behavior at inference time.

Steps

Understand expert count and activation. DeepSeek-V3 has 256 experts, activating 8 per token. Mixtral 8x7B has 8 experts, activating 2 per token.

Inspect routing decisions via logits. Use the Ollama API to retrieve raw output scores:

curl -s http://localhost:11434/api/generate \
  -d '{"model": "deepseek-r1:14b", "prompt": "What is MoE?", "options": {"temperature": 0}}' \
  | jq '.'

Measure expert load balance. A healthy MoE model distributes tokens evenly. Skewed routing indicates training issues.

import torch
# Pseudo: retrieve per-expert token counts from router logits
# Balanced: each expert receives ~total_tokens / num_experts tokens

Compare active vs. total parameters. DeepSeek-V3 has 671B total but activates only ~37B per token. Verify efficient compute:
```
ollama show deepseek-r1:14b | grep -i parameter
```

Verification

# Check model parameter distribution
ollama show deepseek-r1:14b
# Expected: "total parameters: 671B, active parameters: 37B per token"

Common failures

Confusing total vs. active parameter counts: MoE papers list total parameters first; active count is what determines compute cost.
Router collapse: If training went wrong, all tokens route to the same expert. Balanced routing is a health indicator.
Overlooking expert capacity: Each expert has a token budget; exceeding it causes token dropping in some implementations.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

How to understand MoE architecture and expert routing

What this does

Steps

Verification

Common failures

Operator checkpoint

Operator checkpoint

Related guides