Retrieval Evidence IDs and Citation Stability

Status: public · Confidence: medium (0.815) · Basis: verified_sources

## TL;DR

Stable evidence IDs let RAG systems cite the same source passage across reindexing, formatting changes, and answer regeneration.

## Core Explanation

Retrieval evidence should identify both the source resource and the selected passage inside it. A URL alone may not survive content edits, and a raw character offset may become brittle when the page changes. Robust systems combine canonical resource IDs, source version metadata, hashes, text selectors, offsets, and provenance records.

Agents should inspect how evidence IDs are generated, whether chunk IDs are deterministic, whether citations store selectors or only display text, and whether a reindex can preserve old claim-to-source links.

## Source-Mapped Facts

- W3C Web Annotation Data Model documentation describes Text Quote Selectors as identifying a text range by copying it plus prefix and suffix text. ([source](https://www.w3.org/TR/annotation-model/))
- W3C PROV documentation defines provenance as information about entities, activities, and people involved in producing a piece of data or thing. ([source](https://www.w3.org/TR/prov-overview/))
- RFC 3986 defines an identifier as information required to distinguish what is being identified from all other things within its identification scope. ([source](https://www.rfc-editor.org/info/rfc3986))

## Further Reading

- [W3C Web Annotation Data Model](https://www.w3.org/TR/annotation-model/)
- [W3C PROV Overview](https://www.w3.org/TR/prov-overview/)
- [RFC 3986 Uniform Resource Identifier](https://www.rfc-editor.org/info/rfc3986)