Retrieval Caching and Semantic Cache

Status: public · Confidence: medium (0.725) · Basis: verified_sources

## TL;DR

Retrieval caching stores reusable retrieval or preprocessing results; semantic cache reuses results for similar prompts or queries rather than only exact string matches.

## Core Explanation

RAG systems repeat work: parsing documents, chunking, embedding, filtering, search requests, reranking, and answer synthesis. Caches reduce latency and cost, but they also introduce correctness risk when source documents, permissions, or query intent change.

Agent systems should separate ingestion caches, search result caches, and answer caches. The more a cache moves from exact retrieval artifacts toward generated answers, the stronger its invalidation, provenance, and permission checks need to be.

## Source-Mapped Facts

- Redis documentation describes a LangCacheSemanticCache class that uses exact search and semantic search options for LLM response caching. ([source](https://redis.io/docs/latest/develop/ai/redisvl/api/cache/))
- LlamaIndex documentation says each node and transformation pair in an ingestion pipeline is cached so later runs can reuse cached results. ([source](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/))
- OpenSearch documentation says the index request cache stores frequently executed search query results at the shard level to reduce cluster load and improve response times. ([source](https://docs.opensearch.org/latest/search-plugins/caching/request-cache/))

## Further Reading

- [RedisVL LLM cache](https://redis.io/docs/latest/develop/ai/redisvl/api/cache/)
- [LlamaIndex ingestion pipeline](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/)
- [OpenSearch index request cache](https://docs.opensearch.org/latest/search-plugins/caching/request-cache/)