# RAG Citation and Source Attribution
Status: public
Confidence: medium (0.725) (verified)
Last verified: 2026-06-02
Generation: ai_structured


## TL;DR

RAG citation and source attribution connect generated answers back to the retrieved documents, chunks, and evaluation records that justify them.

## Core Explanation

RAG does not become trustworthy merely because it retrieved documents. A citation system must preserve which chunks were retrieved, which chunks were used in synthesis, and which final claims map back to those chunks.

Good attribution includes stable document identifiers, chunk offsets, source titles, URLs, and evaluation coverage for citation correctness. Without this mapping, users and downstream agents cannot distinguish grounded answers from fluent summaries that only loosely resemble the evidence.

## Source-Mapped Facts

- OpenAI file search documentation says file search lets models retrieve information from uploaded files through semantic and keyword search before generating a response. ([source](https://developers.openai.com/api/docs/guides/tools-file-search))
- LangSmith evaluation documentation says datasets contain examples and that reference outputs are used only in evaluators. ([source](https://docs.langchain.com/langsmith/evaluation-concepts))
- LlamaIndex documentation says CitationQueryEngine can be used with any existing index and exposes a citation_chunk_size setting for citation granularity. ([source](https://developers.llamaindex.ai/python/examples/query_engine/citation_query_engine/))

## Further Reading

- [OpenAI file search](https://developers.openai.com/api/docs/guides/tools-file-search)
- [LangSmith evaluation concepts](https://docs.langchain.com/langsmith/evaluation-concepts)
- [LlamaIndex CitationQueryEngine](https://developers.llamaindex.ai/python/examples/query_engine/citation_query_engine/)