DBRX Instruct
Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.
Positioning
Databricks DBRX Instruct is the instruction-tuned variant of DBRX Base — a 132 billion parameter Mixture-of-Experts model with 36B active parameters per token. Released March 2024 by Databricks under a permissive open-weight license (DBRX Open Model License — broadly commercial-friendly with size-cap on competing services). The model was Databricks' demonstration that fine-grained MoE (16 experts, 4 active per token) could deliver strong dense-equivalent capability at lower active inference cost. By 2026, DBRX has been surpassed by DeepSeek V3 / Qwen 3 235B on most benchmarks but remains relevant for Databricks customers and as a reference point.
Strengths
- Permissive license for most commercial uses. DBRX Open Model License allows commercial deployment except for competing AI service offerings.
- MoE active-parameter efficiency. 36B active vs 132B total — inference cost is closer to a 36B dense model.
- Strong code generation. DBRX was specifically trained on high-quality code data and outperformed Llama 2 / earlier Mistral on code benchmarks.
- Databricks ecosystem integration. Deeply tied into Databricks Mosaic / Unity Catalog — first-class MLflow + databricks-sdk support.
- Tool-use capability for agentic workflows.
Limitations
- Surpassed by 2026 frontier MoE models. DeepSeek V3 / Qwen 3 235B both deliver better quality at similar serving costs.
- Compute requirements are still substantial. 132B FP16 needs ~270 GB; Q4 needs ~70 GB. Frontier hardware required.
- MoE serving complexity. Production-grade MoE inference requires vLLM / SGLang / TensorRT-LLM with MoE routing.
- English-focused. Multilingual coverage is weak compared to Aya / Qwen / Command R.
- Long-context degrades quickly. 32K context with notable quality drop past 16K.
- Ecosystem maturity outside Databricks platform is limited. Self-hosting DBRX outside Databricks requires more configuration than Llama / Qwen.
Real-world performance
- vs Llama 3.1 70B: Llama 3.1 70B wins on most benchmarks despite smaller params — reflects 18-month training-data + RLHF improvement gap. DBRX wins on raw active-param inference cost.
- vs DeepSeek V3 (671B MoE): V3 dramatically more capable. Pick V3 for new builds.
- vs Qwen 3 235B-A22B: Qwen 3 stronger on most benchmarks at similar serving cost.
- vs DBRX Base: Instruct is the chat-tuned variant. Pick Instruct for chat/agentic; Base for fine-tuning starting point.
Should you run this locally?
Yes if you're a Databricks customer with Mosaic / Unity Catalog deployment, you specifically need DBRX's permissive license terms, or you have an existing DBRX-tuned application. The Databricks platform integration is genuinely good.
No if you're standing up a new self-hosted MoE deployment in 2026 — pick DeepSeek V3 or Qwen 3 235B for better quality at similar cost. DBRX is now historical reference.
How it compares
- vs DBRX Base: Same architecture, base vs instruct.
- vs DeepSeek V3 (671B MoE): V3 is the architecturally-current frontier; DBRX is 2-year-older silicon-equivalent.
- vs Qwen 3 235B-A22B: Qwen 3 strictly better quality at similar serving cost.
- vs Mixtral 8x22B: Different MoE expert count (8 large vs 16 small). Comparable era; different architecture choices.
Run this yourself
- Databricks platform: Native deployment via Databricks Mosaic — the canonical path.
- Self-hosted single-card: MI300X (192 GB) at FP16, Mac Studio M3 Ultra (192 GB) at Q5 with MLX.
- Self-hosted datacenter: 4× H100 PCIe at FP8 with vLLM MoE routing.
- Cloud rental: Runpod / Lambda H100 SXM cluster ~$25-40/hr per node.
- Vendor: databricks/dbrx-instruct on Hugging Face.
Overview
Databricks' MoE. 132B total / 36B active. Designed for Mosaic ML pipelines; strong tool-calling discipline. Multi-GPU only.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Databricks ecosystem alignment
- Strong tool-calling
- MoE efficiency
Weaknesses
- Multi-GPU only
- Older release — Llama 4 / DeepSeek V4 are sharper
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 75.0 GB | 96 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of DBRX Instruct.
Frequently asked
What's the minimum VRAM to run DBRX Instruct?
Can I use DBRX Instruct commercially?
What's the context length of DBRX Instruct?
Source: huggingface.co/databricks/dbrx-instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify DBRX Instruct runs on your specific hardware before committing money.