Decoder-Only Transformer — AI glossary

Decoder-only is the architecture of GPT, Llama, Qwen, Mistral, DeepSeek, and almost every modern open-weight LLM. The model is a stack of transformer decoder blocks (causal self-attention only, no cross-attention to a separate encoder), trained autoregressively on next-token prediction.

The "encoder" doesn't exist as a separate component — input prompt tokens and generated tokens go through the same blocks. The causal mask prevents tokens from attending to future positions.

Decoder-only won the architecture war for generative LLMs because it scales cleanly with parameters and data, and a single model can do everything (chat, code, reasoning, summarization) with prompting alone.

Related terms