Encoder-Decoder Transformer
Encoder-decoder transformers (T5, BART, original "Attention is All You Need" architecture) have two halves: an encoder reads the input bidirectionally, a decoder generates output autoregressively while cross-attending to encoder outputs.
Strengths: well-suited to translation, summarization, and structured input→output tasks. The encoder can use bidirectional attention, giving it stronger representation of the input than a decoder-only model can.
Modern open-weight LLMs are mostly decoder-only because scaling laws favored the simpler architecture and the gap closed with larger context. Encoder-decoder remains relevant in specialty translation and embedding models (some ColBERT variants, multilingual T5).
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.