RAG Contextual Compression
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR RAG contextual compression narrows retrieved context after initial retrieval so the generator sees fewer, more relevant passages. ## Core Explanation Naive RAG often sends every top-k chunk to the model. Contextual compression inserts a post-retrieval step that filters, transforms, or reranks retrieved material according to the current query. The goal is to preserve answer-critical evidence while reducing distraction and context-window cost. This is not a substitute for recall. If the first-stage retriever misses the right document, compression cannot recover it. A good evaluation plan measures both retrieval recall before compression and answer quality after compression. ## Source-Mapped Facts - LangChain documentation shows a CrossEncoderReranker used with ContextualCompressionRetriever to rerank retrieved documents. ([source](https://docs.langchain.com/oss/python/integrations/document_transformers/cross_encoder_reranker)) - LlamaIndex documentation says node postprocessors take a set of nodes and apply transformation, filtering, or re-ranking logic. ([source](https://developers.llamaindex.ai/python/framework/module_guides/querying/node_postprocessors/)) - Cohere reranking documentation describes using a rerank model to return the most relevant documents for a query. ([source](https://docs.cohere.com/v2/docs/reranking-with-cohere)) ## Further Reading - [LangChain Cross Encoder Reranker Integration](https://docs.langchain.com/oss/python/integrations/document_transformers/cross_encoder_reranker) - [LlamaIndex Node Postprocessor](https://developers.llamaindex.ai/python/framework/module_guides/querying/node_postprocessors/) - [Reranking with Cohere](https://docs.cohere.com/v2/docs/reranking-with-cohere)