Retrieval Sparse Vectors and Learned Sparse Retrieval

Status: public · Confidence: medium (0.685) · Basis: verified_sources

## TL;DR

Sparse vectors give retrieval systems a token-aware signal that can be combined with dense vectors for hybrid RAG search.

## Core Explanation

Dense embeddings help with semantic similarity, but they can miss exact identifiers, rare product names, error codes, legal citations, and vocabulary that should not be paraphrased away. Sparse vectors preserve sparse token-weight information, including learned sparse representations.

Agents debugging retrieval should inspect whether the system stores dense vectors, sparse vectors, or both; how sparse values are generated; whether dense and sparse scores are normalized; and whether hybrid weighting was calibrated on real relevance judgments.

## Source-Mapped Facts

- Qdrant documentation says sparse vectors do not have a fixed length and are dynamically allocated during vector insertion. ([source](https://qdrant.tech/documentation/manage-data/vectors/))
- Qdrant documentation describes sparse vectors as useful for exact-token matching use cases in vector search collections. ([source](https://qdrant.tech/documentation/manage-data/vectors/))
- Pinecone documentation describes hybrid search as combining semantic and lexical search signals. ([source](https://docs.pinecone.io/guides/search/hybrid-search))

## Further Reading

- [Qdrant Vectors Documentation](https://qdrant.tech/documentation/manage-data/vectors/)
- [Pinecone Hybrid Search](https://docs.pinecone.io/guides/search/hybrid-search)