PagedAttention

PagedAttention is the memory layout introduced by vLLM that stores the KV cache in fixed-size blocks (pages), like virtual memory. Each request keeps a page table mapping logical positions to physical blocks; blocks can be shared across requests with identical prefixes.

The win: eliminates the internal fragmentation that wastes 60–80% of KV cache memory in naive contiguous allocations. PagedAttention lets vLLM pack 2–4× more concurrent requests into the same VRAM.

A side benefit: trivial prefix caching — multiple requests sharing a system prompt point at the same physical block, so the prefill cost is paid once.

Related terms

See also