DBRX Base
DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.
Positioning
DBRX Base is a 132B total parameter Mixture-of-Experts (MoE) model from Databricks, with approximately 36B parameters activated per token. Released under the Databricks Open Model License, it is designed as a base model for fine-tuning, not instruction following. Its fine-grained MoE architecture means inference cost is closer to a dense ~36B model than a dense 132B model, making it more efficient than its total parameter count suggests. With a 32,768 token context window, it targets enterprise users who need a customizable foundation for domain-specific tasks.
Strengths
- Efficient MoE architecture: With 132B total parameters but only ~36B active per token, DBRX Base offers the representational capacity of a large model at inference costs comparable to a much smaller dense model.
- Permissive commercial license: The Databricks Open Model License allows for commercial use and fine-tuning, making it suitable for enterprise deployment.
- Large context window: At 32,768 tokens, it can handle substantial documents or codebases in a single pass.
- Fine-tuning base: As a base model, it is optimized for customization via fine-tuning, giving operators full control over behavior.
Limitations
- Datacenter-class hardware required: Even at Q4_K_M (74.3 GB), plus KV cache overhead (30-50% additional), the model demands multi-GPU setups typical of datacenters. Consumer or workstation GPUs are insufficient.
- No instruction tuning: DBRX Base is not designed for chat or instruction following out of the box; operators must fine-tune for specific tasks.
- Limited community benchmarks: As a relatively new model, independent benchmark results are sparse. Published vendor metrics should be treated as best-case.
- High memory overhead: The MoE architecture can introduce additional memory pressure from expert routing and load balancing, especially at longer contexts.
What it takes to run this locally
DBRX Base requires datacenter-class hardware. Quantized sizes range from ~264 GB (FP16) down to ~42.9 GB (Q2_K), but even the smallest quant needs significant GPU memory when accounting for KV cache and framework overhead (add ~30-50%). For example, Q4_K_M at ~74.3 GB plus ~22-37 GB overhead means a minimum of ~96-111 GB of VRAM, necessitating multiple high-end GPUs (e.g., 2× 80GB A100s or 4× 24GB GPUs). No single consumer GPU can run this model.
Should you run this locally?
Yes if you have access to datacenter-grade multi-GPU hardware, need a permissively licensed base model for fine-tuning on proprietary data, and can benefit from the MoE efficiency of 36B active parameters.
No if you lack multi-GPU infrastructure, require an out-of-the-box instruction-tuned model, or need to run on consumer or workstation hardware.
Catalog cross-links
- DBRX Instruct – instruction-tuned variant of DBRX
- Mixtral 8x22B – another large MoE model with similar active parameter count
- Databricks Open Model License – details on usage rights
Overview
DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.
How to run it
DBRX is Databricks' 132B MoE model (~36B active per token with 4-of-16 expert routing). Run at Q4_K_M via llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~75 GB on disk. Minimum VRAM: 48 GB — RTX A6000 (48GB) at Q4_K_M with expert offload, or dual RTX 3090 row-split (48 GB total). Recommended: A100 80GB at AWQ-INT4. Throughput: ~15-25 tok/s on A6000 at Q4_K_M (8K context). DBRX uses a fine-grained MoE with 16 experts (4 active) — more routing decisions per token than Mixtral-style (8 experts, 2 active). This means higher routing overhead but potentially better expert specialization. DBRX is a base model — not instruction-tuned. Use for fine-tuning, not direct chat. For instruction-tuned use, look at DBRX-Instruct or fine-tune yourself. Ollama may not have DBRX base — verify the tag. Architecture: standard transformer with MoE FFN layers — well-supported in llama.cpp and potentially vLLM.
Hardware guidance
Minimum: dual RTX 3090 48 GB total at Q4_K_M (tight at 4K context). Recommended: A100 80GB at AWQ-INT4 for serving. Budget: RTX A6000 48GB at Q3_K_M with expert offload. VRAM math: 132B total, ~36B active (4 experts selected). Q4_K_M for full 132B: ~70-80 GB. Expert offload reduces VRAM to ~30-40 GB (active experts in VRAM, rest in RAM). KV cache at 8K: ~10-15 GB. 48 GB with expert offload: borderline. 80 GB A100: comfortable with all experts in VRAM. Mac Studio M4 Max 64GB: Q4_K_M with expert offload, 3-6 tok/s. RTX 4090 24GB: Q3_K_M with aggressive expert offload. Cloud: single A100 at $5-10/hr for AWQ.
What breaks first
- Base model, not instruct. DBRX-base has no chat or instruction tuning. Raw completions will continue the prompt style — not answer questions. Fine-tuning or few-shot prompting is necessary. 2. Fine-grained MoE routing overhead. 16 experts with top-4 routing per token means more routing decisions and higher all-to-all communication. On PCIe cards, this routing pattern causes more stalls than Mixtral-style. 3. AWQ calibration gap. DBRX AWQ quants calibrated on generic data may not preserve quality on domain-specific tasks. Test quant quality on your data before deploying. 4. Databricks' license. Verify DBRX's license for commercial use — it may differ from standard open-weight licenses. Check huggingface.co/databricks/dbrx-base for terms.
Runtime recommendation
Common beginner mistakes
Mistake: Expecting DBRX-base to chat. Fix: Base models generate completions, not conversations. Use DBRX-Instruct or fine-tune. Use few-shot prompting with careful formatting for base model use. Mistake: Assuming 132B total means it needs 132 GB VRAM. Fix: MoE with Q4_K_M is 75 GB on disk. Active subset per token is only ~36B (21 GB at Q4). Expert offload makes it run on 48 GB. Mistake: Using standard Llama GGUF conversion. Fix: DBRX has a specific architecture. Use the correct conversion script or pre-converted GGUFs from TheBloke or bartowski. Mistake: Ignoring the 16-expert routing overhead. Fix: DBRX's top-4-of-16 routing is more complex than Mixtral's top-2-of-8. Expect higher latency variance per token due to more frequent expert switches.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Fine-grained MoE
- Databricks Mosaic recipe
Weaknesses
- Use dbrx-instruct for chat
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 75.0 GB | 96 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DBRX Base.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DBRX Base?
Can I use DBRX Base commercially?
What's the context length of DBRX Base?
Source: huggingface.co/databricks/dbrx-base
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify DBRX Base runs on your specific hardware before committing money.