dbrx
132B parameters
Commercial OK
Reviewed June 2026

DBRX Instruct

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

License: Databricks Open Model License·Released Mar 27, 2024·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Databricks DBRX Instruct is the instruction-tuned variant of DBRX Base — a 132 billion parameter Mixture-of-Experts model with 36B active parameters per token. Released March 2024 by Databricks under a permissive open-weight license (DBRX Open Model License — broadly commercial-friendly with size-cap on competing services). The model was Databricks' demonstration that fine-grained MoE (16 experts, 4 active per token) could deliver strong dense-equivalent capability at lower active inference cost. By 2026, DBRX has been surpassed by DeepSeek V3 / Qwen 3 235B on most benchmarks but remains relevant for Databricks customers and as a reference point.

Strengths

  • Permissive license for most commercial uses. DBRX Open Model License allows commercial deployment except for competing AI service offerings.
  • MoE active-parameter efficiency. 36B active vs 132B total — inference cost is closer to a 36B dense model.
  • Strong code generation. DBRX was specifically trained on high-quality code data and outperformed Llama 2 / earlier Mistral on code benchmarks.
  • Databricks ecosystem integration. Deeply tied into Databricks Mosaic / Unity Catalog — first-class MLflow + databricks-sdk support.
  • Tool-use capability for agentic workflows.

Limitations

  • Surpassed by 2026 frontier MoE models. DeepSeek V3 / Qwen 3 235B both deliver better quality at similar serving costs.
  • Compute requirements are still substantial. 132B FP16 needs ~270 GB; Q4 needs ~70 GB. Frontier hardware required.
  • MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing.
  • English-focused. Multilingual coverage is weak compared to Aya / Qwen / Command R.
  • Long-context degrades quickly. 32K context with notable quality drop past 16K.
  • Ecosystem maturity outside Databricks platform is limited. Self-hosting DBRX outside Databricks requires more configuration than Llama / Qwen.

Real-world performance

  • vs Llama 3.1 70B: Llama 3.1 70B wins on most benchmarks despite smaller params — reflects 18-month training-data + RLHF improvement gap. DBRX wins on raw active-param inference cost.
  • vs DeepSeek V3 (671B MoE): V3 dramatically more capable. Pick V3 for new builds.
  • vs Qwen 3 235B-A22B: Qwen 3 stronger on most benchmarks at similar serving cost.
  • vs DBRX Base: Instruct is the chat-tuned variant. Pick Instruct for chat/agentic; Base for fine-tuning starting point.

Should you run this locally?

Yes if you're a Databricks customer with Mosaic / Unity Catalog deployment, you specifically need DBRX's permissive license terms, or you have an existing DBRX-tuned application. The Databricks platform integration is genuinely good.

No if you're standing up a new self-hosted MoE deployment in 2026 — pick DeepSeek V3 or Qwen 3 235B for better quality at similar cost. DBRX is now historical reference.

How it compares

  • vs DBRX Base: Same architecture, base vs instruct.
  • vs DeepSeek V3 (671B MoE): V3 is the architecturally-current frontier; DBRX is 2-year-older silicon-equivalent.
  • vs Qwen 3 235B-A22B: Qwen 3 strictly better quality at similar serving cost.
  • vs Mixtral 8x22B: Different MoE expert count (8 large vs 16 small). Comparable era; different architecture choices.

Run this yourself

  • Databricks platform: Native deployment via Databricks Mosaic — the canonical path.
  • Self-hosted single-card: MI300X (192 GB) at FP16, Mac Studio M3 Ultra (192 GB) at Q5 with MLX.
  • Self-hosted datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
  • Cloud rental: Runpod / Lambda H100 SXM cluster ~$25-40/hr per node.
  • Vendor: databricks/dbrx-instruct on Hugging Face.

Overview

Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (dbrx)
DBRX Base132B
Datacenter
DBRX Instruct132B
You are here

Strengths

  • Databricks ecosystem alignment
  • Strong tool-calling
  • MoE efficiency

Weaknesses

  • Multi-GPU only
  • Older release — Llama 4 / DeepSeek V4 are sharper

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT475.0 GB96 GB

Get the model

HuggingFace

Original weights

huggingface.co/databricks/dbrx-instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of DBRX Instruct.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.

Frequently asked

What's the minimum VRAM to run DBRX Instruct?

96GB of VRAM is enough to run DBRX Instruct at the AWQ-INT4 quantization (file size 75.0 GB). Higher-quality quantizations need more.

Can I use DBRX Instruct commercially?

Yes — DBRX Instruct ships under the Databricks Open Model License, which permits commercial use. Always read the license text before deployment.

What's the context length of DBRX Instruct?

DBRX Instruct supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/databricks/dbrx-instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify DBRX Instruct runs on your specific hardware before committing money.