# Long-Context Language Models: Memory, Retrieval, and Evaluation Status: public Confidence: medium (0.82) (verified) Last verified: 2026-05-30 Generation: ai_structured ## TL;DR Long-context language models try to use more of the prompt, document, or conversation history at once. The important distinction is advertised context length versus usable context length: a model can accept many tokens while still missing details, over-weighting endpoints, or paying high inference cost. ## Core Explanation The standard Transformer has strong sequence modeling behavior, but full attention becomes expensive as context length grows. Transformer-XL introduced recurrence across segments so a model could reuse past hidden states. Longformer changed the attention pattern, using local sliding windows and selected global tokens to make long-document processing cheaper. Evaluation is its own problem. Needle-style retrieval tests can show that a model finds one inserted fact, but they do not cover all long-context behaviors. Benchmarks such as RULER test multiple synthetic tasks so researchers can compare practical context use rather than only nominal token limits. ## Related Articles - [Large Language Models (LLMs)](../llms.md) - [Retrieval-Augmented Generation (RAG)](../rag.md) - [Attention Mechanism: Query-Key-Value and Contextual Representation](../attention-mechanism.md)