DBRX Instruct
Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.
Positioning
Databricks DBRX Instruct is the instruction-tuned variant of DBRX Base — a 132 billion parameter Mixture-of-Experts model with 36B active parameters per token. Released March 2024 by Databricks under a permissive open-weight license (DBRX Open Model License — broadly commercial-friendly with size-cap on competing services). The model was Databricks' demonstration that fine-grained MoE (16 experts, 4 active per token) could deliver strong dense-equivalent capability at lower active inference cost. By 2026, DBRX has been surpassed by DeepSeek V3 / Qwen 3 235B on most benchmarks but remains relevant for Databricks customers and as a reference point.
Strengths
- Permissive license for most commercial uses. DBRX Open Model License allows commercial deployment except for competing AI service offerings.
- MoE active-parameter efficiency. 36B active vs 132B total — inference cost is closer to a 36B dense model.
- Strong code generation. DBRX was specifically trained on high-quality code data and outperformed Llama 2 / earlier Mistral on code benchmarks.
- Databricks ecosystem integration. Deeply tied into Databricks Mosaic / Unity Catalog — first-class MLflow + databricks-sdk support.
- Tool-use capability for agentic workflows.
Limitations
- Surpassed by 2026 frontier MoE models. DeepSeek V3 / Qwen 3 235B both deliver better quality at similar serving costs.
- Compute requirements are still substantial. 132B FP16 needs ~270 GB; Q4 needs ~70 GB. Frontier hardware required.
- MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing.
- English-focused. Multilingual coverage is weak compared to Aya / Qwen / Command R.
- Long-context degrades quickly. 32K context with notable quality drop past 16K.
- Ecosystem maturity outside Databricks platform is limited. Self-hosting DBRX outside Databricks requires more configuration than Llama / Qwen.
Real-world performance
- vs Llama 3.1 70B: Llama 3.1 70B wins on most benchmarks despite smaller params — reflects 18-month training-data + RLHF improvement gap. DBRX wins on raw active-param inference cost.
- vs DeepSeek V3 (671B MoE): V3 dramatically more capable. Pick V3 for new builds.
- vs Qwen 3 235B-A22B: Qwen 3 stronger on most benchmarks at similar serving cost.
- vs DBRX Base: Instruct is the chat-tuned variant. Pick Instruct for chat/agentic; Base for fine-tuning starting point.
Should you run this locally?
Yes if you're a Databricks customer with Mosaic / Unity Catalog deployment, you specifically need DBRX's permissive license terms, or you have an existing DBRX-tuned application. The Databricks platform integration is genuinely good.
No if you're standing up a new self-hosted MoE deployment in 2026 — pick DeepSeek V3 or Qwen 3 235B for better quality at similar cost. DBRX is now historical reference.
How it compares
- vs DBRX Base: Same architecture, base vs instruct.
- vs DeepSeek V3 (671B MoE): V3 is the architecturally-current frontier; DBRX is 2-year-older silicon-equivalent.
- vs Qwen 3 235B-A22B: Qwen 3 strictly better quality at similar serving cost.
- vs Mixtral 8x22B: Different MoE expert count (8 large vs 16 small). Comparable era; different architecture choices.
Run this yourself
- Databricks platform: Native deployment via Databricks Mosaic — the canonical path.
- Self-hosted single-card: MI300X (192 GB) at FP16, Mac Studio M3 Ultra (192 GB) at Q5 with MLX.
- Self-hosted datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
- Cloud rental: Runpod / Lambda H100 SXM cluster ~$25-40/hr per node.
- Vendor: databricks/dbrx-instruct on Hugging Face.
Overview
Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Databricks ecosystem alignment
- Strong tool-calling
- MoE efficiency
Weaknesses
- Multi-GPU only
- Older release — Llama 4 / DeepSeek V4 are sharper
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 75.0 GB | 96 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DBRX Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run DBRX Instruct?
Can I use DBRX Instruct commercially?
What's the context length of DBRX Instruct?
Source: huggingface.co/databricks/dbrx-instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify DBRX Instruct runs on your specific hardware before committing money.