Dense Model — AI glossary

A dense model activates every parameter on every forward pass — the default architecture for transformers like Llama, Qwen, and Mistral 7B. Distinct from sparse / Mixture-of-Experts models, which route each token through a subset of expert sub-networks.

Dense models are simpler to serve (no routing overhead, simpler quantization paths, no all-to-all communication) but pay full compute cost at inference. A 70B dense model does ~70B parameters × 2 FLOPs × tokens of work per forward pass.

For local AI, dense remains the dominant deployment shape because tooling support is universal and a single dense model fits a single GPU more cleanly than an MoE of equivalent quality.

Related terms