RAG Index Evaluation with Recall@k

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

Recall@k evaluates whether a RAG index retrieves the needed evidence within the top k results before the generator tries to answer.

## Core Explanation

RAG quality can fail even when the language model is strong if the retriever never surfaces the right documents. Recall@k focuses on coverage: did the relevant evidence appear in the candidate set at a usable cutoff?

Agents should use recall@k with other signals. A high-recall retriever can still produce poor answers if chunks are noisy, citations are wrong, reranking is weak, or the generator ignores the retrieved evidence.

## Source-Mapped Facts

- Ragas context recall documentation says context recall measures how many relevant documents or pieces of information were successfully retrieved. ([source](https://docs.ragas.io/en/v0.2.0/concepts/metrics/available_metrics/context_recall/))
- LlamaIndex retrieval evaluation documentation lists hit rate, MRR, precision, recall, AP, and NDCG as retrieval metrics. ([source](https://docs.llamaindex.ai/en/stable/examples/evaluation/retrieval/retriever_eval/))
- Azure AI Search vector ranking documentation says exhaustive KNN can be used to build a ground-truth nearest-neighbor set for evaluating ANN recall. ([source](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking))

## Further Reading

- [Ragas Context Recall](https://docs.ragas.io/en/v0.2.0/concepts/metrics/available_metrics/context_recall/)
- [LlamaIndex Retrieval Evaluation](https://docs.llamaindex.ai/en/stable/examples/evaluation/retrieval/retriever_eval/)
- [Azure AI Search Vector Ranking](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking)