Retrieval Passage Boundaries and Overlap Windows
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Passage boundaries and overlap windows determine whether retrieval returns a complete answer span or a fragment split away from its context. ## Core Explanation RAG systems rarely retrieve whole documents. They index passages, nodes, or chunks, and the split strategy controls the semantic unit visible to the retriever. Overlap can preserve answer context at chunk edges, but it also increases index size, duplicate hits, and citation ambiguity. Agents should inspect chunk size, overlap, separator order, parent document IDs, start offsets, metadata inheritance, and recall failures before changing embedding models or rerankers. ## Source-Mapped Facts - LangChain documentation describes RecursiveCharacterTextSplitter as trying separators in order until chunks are small enough. ([source](https://docs.langchain.com/oss/python/integrations/splitters/recursive_text_splitter)) - LlamaIndex documentation describes node parsers as tools for parsing documents into Node objects. ([source](https://developers.llamaindex.ai/python/framework/module_guides/loading/node_parsers/)) - Haystack DocumentSplitter documentation describes splitting documents by split_by after split_length units with an overlap of split_overlap units. ([source](https://docs.haystack.deepset.ai/docs/documentsplitter)) ## Further Reading - [LangChain Recursive Text Splitter](https://docs.langchain.com/oss/python/integrations/splitters/recursive_text_splitter) - [LlamaIndex Node Parser Usage Pattern](https://developers.llamaindex.ai/python/framework/module_guides/loading/node_parsers/) - [Haystack DocumentSplitter](https://docs.haystack.deepset.ai/docs/documentsplitter)