Dense Retrieval

Dense retrieval finds documents by computing cosine similarity (or dot product) between learned vector embeddings of the query and each document. Distinct from sparse retrieval (BM25, TF-IDF) which uses lexical token-frequency features.

Dense embeddings capture semantic similarity — a query about "automobile insurance" matches documents about "car coverage" even with no shared tokens. The cost: requires an embedding model and a vector index (FAISS, HNSW, IVF).

For local RAG, common embedding models are BGE-M3, Snowflake Arctic, and Qwen3-Embedding. A 384–1024 dimension index over 100K chunks fits comfortably in 1–4 GB of RAM and queries in <10 ms with HNSW.

Related terms

See also