# RAG Hypothetical Document Embeddings HyDE Status: public Confidence: medium (0.85) (verified) Last verified: 2026-06-03 Generation: ai_structured ## TL;DR HyDE improves some RAG retrieval setups by embedding a model-generated hypothetical answer or document instead of embedding the short user query directly. ## Core Explanation Dense retrieval can fail when the user query is too short, underspecified, or phrased differently from the corpus. HyDE rewrites the retrieval problem by asking a model to produce an answer-like document first, then embedding that generated text for search. Agents should treat HyDE as a retrieval transformation, not as evidence. The generated document can guide search, but the final answer still needs citations from real retrieved documents. ## Source-Mapped Facts - Gao et al. propose Hypothetical Document Embeddings, or HyDE, for zero-shot dense retrieval without relevance labels. ([source](https://aclanthology.org/2023.acl-long.99/)) - The HyDE paper says HyDE first uses an instruction-following language model to generate a hypothetical document for a query. ([source](https://aclanthology.org/2023.acl-long.99/)) - LlamaIndex documentation describes HyDE as generating a hypothetical document or answer and using it for embedding lookup instead of the raw query. ([source](https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/query_transformations/)) ## Further Reading - [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://aclanthology.org/2023.acl-long.99/) - [LlamaIndex Query Transformations](https://docs.llamaindex.ai/en/stable/optimizing/advanced_retrieval/query_transformations/)