dbrx
132B parameters
Commercial OK
Reviewed May 2026

DBRX Base

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

License: Databricks Open Model License·Released Mar 27, 2024·Context: 32,768 tokens

Overview

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

How to run it

DBRX is Databricks' 132B MoE model (~36B active per token with 4-of-16 expert routing). Run at Q4_K_M via llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~75 GB on disk. Minimum VRAM: 48 GB — RTX A6000 (48GB) at Q4_K_M with expert offload, or dual RTX 3090 row-split (48 GB total). Recommended: A100 80GB at AWQ-INT4. Throughput: ~15-25 tok/s on A6000 at Q4_K_M (8K context). DBRX uses a fine-grained MoE with 16 experts (4 active) — more routing decisions per token than Mixtral-style (8 experts, 2 active). This means higher routing overhead but potentially better expert specialization. DBRX is a base model — not instruction-tuned. Use for fine-tuning, not direct chat. For instruction-tuned use, look at DBRX-Instruct or fine-tune yourself. Ollama may not have DBRX base — verify the tag. Architecture: standard transformer with MoE FFN layers — well-supported in llama.cpp and potentially vLLM.

Hardware guidance

Minimum: dual RTX 3090 48 GB total at Q4_K_M (tight at 4K context). Recommended: A100 80GB at AWQ-INT4 for serving. Budget: RTX A6000 48GB at Q3_K_M with expert offload. VRAM math: 132B total, ~36B active (4 experts selected). Q4_K_M for full 132B: ~70-80 GB. Expert offload reduces VRAM to ~30-40 GB (active experts in VRAM, rest in RAM). KV cache at 8K: ~10-15 GB. 48 GB with expert offload: borderline. 80 GB A100: comfortable with all experts in VRAM. Mac Studio M4 Max 64GB: Q4_K_M with expert offload, 3-6 tok/s. RTX 4090 24GB: Q3_K_M with aggressive expert offload. Cloud: single A100 at $5-10/hr for AWQ.

What breaks first

  1. Base model, not instruct. DBRX-base has no chat or instruction tuning. Raw completions will continue the prompt style — not answer questions. Fine-tuning or few-shot prompting is necessary. 2. Fine-grained MoE routing overhead. 16 experts with top-4 routing per token means more routing decisions and higher all-to-all communication. On PCIe cards, this routing pattern causes more stalls than Mixtral-style. 3. AWQ calibration gap. DBRX AWQ quants calibrated on generic data may not preserve quality on domain-specific tasks. Test quant quality on your data before deploying. 4. Databricks' license. Verify DBRX's license for commercial use — it may differ from standard open-weight licenses. Check huggingface.co/databricks/dbrx-base for terms.

Runtime recommendation

llama.cpp with -ngl 999 for local use. vLLM for multi-user serving on A100. DBRX's fine-grained MoE benefits from vLLM's expert-parallel scheduling. Avoid Ollama for base models — it's designed for instruct/chat. For fine-tuning: Axolotl or Unsloth with QLoRA.

Common beginner mistakes

Mistake: Expecting DBRX-base to chat. Fix: Base models generate completions, not conversations. Use DBRX-Instruct or fine-tune. Use few-shot prompting with careful formatting for base model use. Mistake: Assuming 132B total means it needs 132 GB VRAM. Fix: MoE with Q4_K_M is 75 GB on disk. Active subset per token is only ~36B (21 GB at Q4). Expert offload makes it run on 48 GB. Mistake: Using standard Llama GGUF conversion. Fix: DBRX has a specific architecture. Use the correct conversion script or pre-converted GGUFs from TheBloke or bartowski. Mistake: Ignoring the 16-expert routing overhead. Fix: DBRX's top-4-of-16 routing is more complex than Mixtral's top-2-of-8. Expect higher latency variance per token due to more frequent expert switches.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (dbrx)
DBRX Base132B
You are here
DBRX Instruct132B
Datacenter

Strengths

  • Fine-grained MoE
  • Databricks Mosaic recipe

Weaknesses

  • Use dbrx-instruct for chat

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M75.0 GB96 GB

Get the model

HuggingFace

Original weights

huggingface.co/databricks/dbrx-base

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DBRX Base.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run DBRX Base?

96GB of VRAM is enough to run DBRX Base at the Q4_K_M quantization (file size 75.0 GB). Higher-quality quantizations need more.

Can I use DBRX Base commercially?

Yes — DBRX Base ships under the Databricks Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of DBRX Base?

DBRX Base supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/databricks/dbrx-base

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify DBRX Base runs on your specific hardware before committing money.