Encoder-Decoder Transformer — AI glossary

Encoder-decoder transformers (T5, BART, original "Attention is All You Need" architecture) have two halves: an encoder reads the input bidirectionally, a decoder generates output autoregressively while cross-attending to encoder outputs.

Strengths: well-suited to translation, summarization, and structured input→output tasks. The encoder can use bidirectional attention, giving it stronger representation of the input than a decoder-only model can.

Modern open-weight LLMs are mostly decoder-only because scaling laws favored the simpler architecture and the gap closed with larger context. Encoder-decoder remains relevant in specialty translation and embedding models (some ColBERT variants, multilingual T5).

Related terms