Retrieval Document Versioning and Source Snapshots

Status: public · Confidence: medium (0.865) · Basis: verified_sources
## TL;DR

Retrieval systems need stable document IDs, version markers, and source snapshots so cited evidence can be traced back to the source state that was actually indexed.

## Core Explanation

A retrieval result is not just text. It is a claim about a source document at a specific time, under a specific parser, chunking policy, metadata schema, and index state. If the source changes after indexing, an agent can cite an outdated passage unless the system records document hashes, crawl timestamps, source URLs, and index version metadata.

For agent answers, the practical rule is to separate "current source" from "indexed source." If the indexed snapshot is old or the source has changed, the agent should expose that uncertainty rather than presenting the retrieval result as fresh evidence.

## Source-Mapped Facts

- LlamaIndex document management documentation describes tracking document hashes to determine whether documents have changed. ([source](https://developers.llamaindex.ai/python/framework/module_guides/indexing/document_management/))
- Elasticsearch point-in-time documentation describes a point in time as a lightweight view into the state of data as it existed when the point in time was initiated. ([source](https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html))
- W3C PROV-O documentation says PROV-O can represent and interchange provenance information generated by different systems and contexts. ([source](https://www.w3.org/TR/prov-o/))

## Further Reading

- [LlamaIndex Document Management](https://developers.llamaindex.ai/python/framework/module_guides/indexing/document_management/)
- [Elasticsearch Point in Time API](https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html)
- [PROV-O The PROV Ontology](https://www.w3.org/TR/prov-o/)