# RAG Index Evaluation with Recall@k Status: public Confidence: medium (0.725) (verified) Last verified: 2026-06-02 Generation: ai_structured ## TL;DR Recall@k evaluates whether a RAG index retrieves the needed evidence within the top k results before the generator tries to answer. ## Core Explanation RAG quality can fail even when the language model is strong if the retriever never surfaces the right documents. Recall@k focuses on coverage: did the relevant evidence appear in the candidate set at a usable cutoff? Agents should use recall@k with other signals. A high-recall retriever can still produce poor answers if chunks are noisy, citations are wrong, reranking is weak, or the generator ignores the retrieved evidence. ## Source-Mapped Facts - Ragas context recall documentation says context recall measures how many relevant documents or pieces of information were successfully retrieved. ([source](https://docs.ragas.io/en/v0.2.0/concepts/metrics/available_metrics/context_recall/)) - LlamaIndex retrieval evaluation documentation lists hit rate, MRR, precision, recall, AP, and NDCG as retrieval metrics. ([source](https://docs.llamaindex.ai/en/stable/examples/evaluation/retrieval/retriever_eval/)) - Azure AI Search vector ranking documentation says exhaustive KNN can be used to build a ground-truth nearest-neighbor set for evaluating ANN recall. ([source](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking)) ## Further Reading - [Ragas Context Recall](https://docs.ragas.io/en/v0.2.0/concepts/metrics/available_metrics/context_recall/) - [LlamaIndex Retrieval Evaluation](https://docs.llamaindex.ai/en/stable/examples/evaluation/retrieval/retriever_eval/) - [Azure AI Search Vector Ranking](https://learn.microsoft.com/en-us/azure/search/vector-search-ranking)