HOW-TO · RAG
How to Choose and Configure FAISS Index Types (IVF, HNSW)
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
FAISS installed, sample embeddings
What this does
FAISS ships dozens of index types optimized for different trade-offs between search speed, accuracy, and memory usage. This guide explains the two most widely used families—IVF (Inverted File) and HNSW (Hierarchical Navigable Small World)—and demonstrates configuring them for production workloads.
Steps
Decide between index families. For fewer than 100,000 vectors, a flat index suffices. For 100K–10M, use an IVF index. For 10M+ or sub-millisecond latency, use HNSW.
Configure an IVF index with a k-means quantizer.
import faiss import numpy as np d = 768 nlist = 100 vectors = np.random.rand(50000, d).astype("float32") faiss.normalize_L2(vectors) quantizer = faiss.IndexFlatL2(d) index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2) index.train(vectors) index.add(vectors) index.nprobe = 10 print(f"Index trained: {index.is_trained}") print(f"Total vectors: {index.ntotal}") # Expected: Index trained: True, Total vectors: 50000Configure an HNSW index.
hnsw_index = faiss.IndexHNSWFlat(d, 32, faiss.METRIC_L2) hnsw_index.hnsw.efConstruction = 200 hnsw_index.hnsw.efSearch = 100 hnsw_index.add(vectors[:5000]) print(f"HNSW vectors indexed: {hnsw_index.ntotal}") # Expected: HNSW vectors indexed: 5000Tune nprobe and efSearch. Increase these values to improve recall at the cost of latency. Profile with realistic queries to find the optimal value for your SLA.
Verification
python3 -c "
import faiss, numpy as np
d = 128
idx = faiss.IndexHNSWFlat(d, 16)
v = np.random.rand(1000, d).astype('float32')
idx.add(v)
print(f'HNSW index ready, vectors: {idx.ntotal}')
"
# Expected: HNSW index ready, vectors: 1000
Common failures
- IndexNotTrained exception. IVF indexes must be trained before vectors can be added. Ensure at least 30–50 times more vectors than
nlist. - nprobe exceeds nlist. Setting
nprobelarger thannlistis inefficient. Adjustnlistto roughlysqrt(total_vectors). - Memory blown by large HNSW graph. Default
M=32with 1M vectors can exceed 50 GB RAM. ReduceMto 8–16 or use a compressed index. - Search returns empty results. Vectors are not normalized before HNSW search with METRIC_INNER_PRODUCT. Always normalize for cosine similarity.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
RELATED GUIDES