YaRN (Yet another RoPE eNlargement)
YaRN is a context-extension method that modifies RoPE frequencies to let a model trained on, say, 8K context generalize to 32K or 128K with minimal fine-tuning. Used in Qwen 2.5, Mistral Nemo, and several Llama 3 long-context derivatives.
Compared to naive frequency scaling (linear or NTK-by-parts), YaRN preserves position discrimination at long range better, with measurable improvement on needle-in-haystack benchmarks past 32K.
Practical implication: when you see "extended to 128K with YaRN" on a model card, expect quality degradation past the original training context to be smaller than with vanilla RoPE scaling, but still real — long-context performance is rarely as good as short-context.
Related terms
Reviewed by Fredoline Eruo. See our editorial policy.