Vector Index Parameters and Recall-Latency Tuning
Status: public · Confidence: medium (0.725) · Basis: verified_sources
## TL;DR Vector index settings are production knobs. Agents should not change them without measuring recall, latency, memory, and filtered-search behavior on the actual query mix. ## Core Explanation Approximate nearest-neighbor search trades exactness for speed. Parameters such as HNSW search depth, construction depth, graph connections, IVF list count, distance metric, and payload-index choices can change which evidence reaches a RAG system. Good tuning reports compare candidate settings against a fixed query set, relevance labels or exact-search baseline, latency percentiles, memory footprint, build time, and filter behavior. A faster index that drops answer-bearing evidence is a retrieval regression even when the generated answer still looks fluent. ## Source-Mapped Facts - Weaviate documentation says HNSW ef balances search speed and recall, with higher ef improving accuracy while slowing search. ([source](https://docs.weaviate.io/weaviate/config-refs/indexing/vector-index)) - Pinecone create-index documentation includes dimension and metric fields for configuring a dense vector index. ([source](https://docs.pinecone.io/reference/create_index/)) - Qdrant documentation says vector indexes speed up vector search and payload indexes speed up filtering. ([source](https://qdrant.tech/documentation/manage-data/indexing/)) ## Further Reading - [Weaviate Vector Index](https://docs.weaviate.io/weaviate/config-refs/indexing/vector-index) - [Pinecone Create Index](https://docs.pinecone.io/reference/create_index/) - [Qdrant Indexing](https://qdrant.tech/documentation/manage-data/indexing/)