Long-Context Language Models: Memory, Retrieval, and Evaluation

Status: public · Confidence: medium (0.82) · Basis: verified_sources

## TL;DR

Long-context language models try to use more of the prompt, document, or conversation history at once. The important distinction is advertised context length versus usable context length: a model can accept many tokens while still missing details, over-weighting endpoints, or paying high inference cost.

## Core Explanation

The standard Transformer has strong sequence modeling behavior, but full attention becomes expensive as context length grows. Transformer-XL introduced recurrence across segments so a model could reuse past hidden states. Longformer changed the attention pattern, using local sliding windows and selected global tokens to make long-document processing cheaper.

Evaluation is its own problem. Needle-style retrieval tests can show that a model finds one inserted fact, but they do not cover all long-context behaviors. Benchmarks such as RULER test multiple synthetic tasks so researchers can compare practical context use rather than only nominal token limits.

## Related Articles

- [Large Language Models (LLMs)](../llms.md)
- [Retrieval-Augmented Generation (RAG)](../rag.md)
- [Attention Mechanism: Query-Key-Value and Contextual Representation](../attention-mechanism.md)