Large language models
Dense Retrieval
Dense retrieval finds documents by computing cosine similarity (or dot product) between learned vector embeddings of the query and each document. Distinct from sparse retrieval (BM25, TF-IDF) which uses lexical token-frequency features.
Dense embeddings capture semantic similarity — a query about "automobile insurance" matches documents about "car coverage" even with no shared tokens. The cost: requires an embedding model and a vector index (FAISS, HNSW, IVF).
For local RAG, common embedding models are BGE-M3, Snowflake Arctic, and Qwen3-Embedding. A 384–1024 dimension index over 100K chunks fits comfortably in 1–4 GB of RAM and queries in <10 ms with HNSW.
Related terms
See also
Reviewed by Fredoline Eruo. See our editorial policy.