dbrx
132B parameters
Commercial OK
Reviewed June 2026

DBRX Base

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

License: Databricks Open Model License·Released Mar 27, 2024·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

DBRX Base is a 132B total parameter Mixture-of-Experts (MoE) model from Databricks, with approximately 36B parameters activated per token. Released under the Databricks Open Model License, it is designed as a base model for fine-tuning, not instruction following. Its fine-grained MoE architecture means inference cost is closer to a dense ~36B model than a dense 132B model, making it more efficient than its total parameter count suggests. With a 32,768 token context window, it targets enterprise users who need a customizable foundation for domain-specific tasks.

Strengths

  • Efficient MoE architecture: With 132B total parameters but only ~36B active per token, DBRX Base offers the representational capacity of a large model at inference costs comparable to a much smaller dense model.
  • Permissive commercial license: The Databricks Open Model License allows for commercial use and fine-tuning, making it suitable for enterprise deployment.
  • Large context window: At 32,768 tokens, it can handle substantial documents or codebases in a single pass.
  • Fine-tuning base: As a base model, it is optimized for customization via fine-tuning, giving operators full control over behavior.

Limitations

  • Datacenter-class hardware required: Even at Q4_K_M (74.3 GB), plus KV cache overhead (30-50% additional), the model demands multi-GPU setups typical of datacenters. Consumer or workstation GPUs are insufficient.
  • No instruction tuning: DBRX Base is not designed for chat or instruction following out of the box; operators must fine-tune for specific tasks.
  • Limited community benchmarks: As a relatively new model, independent benchmark results are sparse. Published vendor metrics should be treated as best-case.
  • High memory overhead: The MoE architecture can introduce additional memory pressure from expert routing and load balancing, especially at longer contexts.

What it takes to run this locally

DBRX Base requires datacenter-class hardware. Quantized sizes range from ~264 GB (FP16) down to ~42.9 GB (Q2_K), but even the smallest quant needs significant GPU memory when accounting for KV cache and framework overhead (add ~30-50%). For example, Q4_K_M at ~74.3 GB plus ~22-37 GB overhead means a minimum of ~96-111 GB of VRAM, necessitating multiple high-end GPUs (e.g., 2× 80GB A100s or 4× 24GB GPUs). No single consumer GPU can run this model.

Should you run this locally?

Yes if you have access to datacenter-grade multi-GPU hardware, need a permissively licensed base model for fine-tuning on proprietary data, and can benefit from the MoE efficiency of 36B active parameters.

No if you lack multi-GPU infrastructure, require an out-of-the-box instruction-tuned model, or need to run on consumer or workstation hardware.

Catalog cross-links

Overview

DBRX base (non-instruct). 132B total / 36B active fine-grained MoE.

How to run it

DBRX is Databricks' 132B MoE model (~36B active per token with 4-of-16 expert routing). Run at Q4_K_M via llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~75 GB on disk. Minimum VRAM: 48 GB — RTX A6000 (48GB) at Q4_K_M with expert offload, or dual RTX 3090 row-split (48 GB total). Recommended: A100 80GB at AWQ-INT4. Throughput: ~15-25 tok/s on A6000 at Q4_K_M (8K context). DBRX uses a fine-grained MoE with 16 experts (4 active) — more routing decisions per token than Mixtral-style (8 experts, 2 active). This means higher routing overhead but potentially better expert specialization. DBRX is a base model — not instruction-tuned. Use for fine-tuning, not direct chat. For instruction-tuned use, look at DBRX-Instruct or fine-tune yourself. Ollama may not have DBRX base — verify the tag. Architecture: standard transformer with MoE FFN layers — well-supported in llama.cpp and potentially vLLM.

Hardware guidance

Minimum: dual RTX 3090 48 GB total at Q4_K_M (tight at 4K context). Recommended: A100 80GB at AWQ-INT4 for serving. Budget: RTX A6000 48GB at Q3_K_M with expert offload. VRAM math: 132B total, ~36B active (4 experts selected). Q4_K_M for full 132B: ~70-80 GB. Expert offload reduces VRAM to ~30-40 GB (active experts in VRAM, rest in RAM). KV cache at 8K: ~10-15 GB. 48 GB with expert offload: borderline. 80 GB A100: comfortable with all experts in VRAM. Mac Studio M4 Max 64GB: Q4_K_M with expert offload, 3-6 tok/s. RTX 4090 24GB: Q3_K_M with aggressive expert offload. Cloud: single A100 at $5-10/hr for AWQ.

What breaks first

  1. Base model, not instruct. DBRX-base has no chat or instruction tuning. Raw completions will continue the prompt style — not answer questions. Fine-tuning or few-shot prompting is necessary. 2. Fine-grained MoE routing overhead. 16 experts with top-4 routing per token means more routing decisions and higher all-to-all communication. On PCIe cards, this routing pattern causes more stalls than Mixtral-style. 3. AWQ calibration gap. DBRX AWQ quants calibrated on generic data may not preserve quality on domain-specific tasks. Test quant quality on your data before deploying. 4. Databricks' license. Verify DBRX's license for commercial use — it may differ from standard open-weight licenses. Check huggingface.co/databricks/dbrx-base for terms.

Runtime recommendation

llama.cpp with -ngl 999 for local use. vLLM for multi-user serving on A100. DBRX's fine-grained MoE benefits from vLLM's expert-parallel scheduling. Avoid Ollama for base models — it's designed for instruct/chat. For fine-tuning: Axolotl or Unsloth with QLoRA.

Common beginner mistakes

Mistake: Expecting DBRX-base to chat. Fix: Base models generate completions, not conversations. Use DBRX-Instruct or fine-tune. Use few-shot prompting with careful formatting for base model use. Mistake: Assuming 132B total means it needs 132 GB VRAM. Fix: MoE with Q4_K_M is 75 GB on disk. Active subset per token is only ~36B (21 GB at Q4). Expert offload makes it run on 48 GB. Mistake: Using standard Llama GGUF conversion. Fix: DBRX has a specific architecture. Use the correct conversion script or pre-converted GGUFs from TheBloke or bartowski. Mistake: Ignoring the 16-expert routing overhead. Fix: DBRX's top-4-of-16 routing is more complex than Mixtral's top-2-of-8. Expect higher latency variance per token due to more frequent expert switches.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (dbrx)
DBRX Base132B
You are here
DBRX Instruct132B
Datacenter

Strengths

  • Fine-grained MoE
  • Databricks Mosaic recipe

Weaknesses

  • Use dbrx-instruct for chat

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M75.0 GB96 GB

Get the model

HuggingFace

Original weights

huggingface.co/databricks/dbrx-base

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DBRX Base.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run DBRX Base?

96GB of VRAM is enough to run DBRX Base at the Q4_K_M quantization (file size 75.0 GB). Higher-quality quantizations need more.

Can I use DBRX Base commercially?

Yes — DBRX Base ships under the Databricks Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of DBRX Base?

DBRX Base supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/databricks/dbrx-base

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify DBRX Base runs on your specific hardware before committing money.