LLM Evaluation Evidence Attribution and Citation Grading

Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR

Evidence attribution grading checks whether an LLM answer cites the right sources, whether its claims are supported, and whether retrieval ranked useful evidence high enough.

## Core Explanation

Generic answer-quality scores are not enough for RAG or source-grounded agents. A fluent answer can still cite the wrong document, omit evidence for a key claim, or use a retrieved chunk that only partially supports the statement.

Citation grading should split the problem into smaller signals: retrieval quality, evidence coverage, claim support, citation span accuracy, and unsupported claim rate. Faithfulness asks whether answer claims follow from retrieved context. Context precision asks whether relevant chunks are high in the retrieved ranking. A custom eval can then combine these signals with task-specific citation rules.

Agents should preserve the question, retrieved context IDs, answer spans, cited source IDs, judge prompt, rubric version, and grader output. Without these artifacts, a passing citation score is hard to reproduce or debug.

## Source-Mapped Facts

- Ragas documentation describes faithfulness as measuring factual consistency between a response and the retrieved context. ([source](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/faithfulness/))
- Ragas documentation says context precision evaluates whether relevant chunks are ranked higher than irrelevant chunks in retrieved context. ([source](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/context_precision/))
- OpenAI Evals documentation describes Evals as a framework for evaluating LLMs or systems built using LLMs, with a registry and custom eval capability. ([source](https://github.com/openai/evals))

## Further Reading

- [Ragas Faithfulness Metric](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/faithfulness/)
- [Ragas Context Precision Metric](https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/context_precision/)
- [OpenAI Evals](https://github.com/openai/evals)