HOW-TO · RAG
How to Optimize FAISS Index for Large Datasets
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
FAISS index with 1M+ vectors
What this does
At one million vectors and beyond, the choice of index algorithm, quantization settings, and hardware allocation determines whether queries return in milliseconds or seconds. This guide covers proven tuning strategies for production deployments handling high-throughput retrieval workloads.
Steps
Profile the current index.
import time, random, numpy as np, psutil def profile_search(index, vectors, k=5, runs=100): latencies = [] for _ in range(runs): q = vectors[random.randint(0, len(vectors)-1):random.randint(0, len(vectors)-1)] start = time.perf_counter() index.search(q, k) latencies.append(time.perf_counter() - start) return np.mean(latencies) * 1000 proc = psutil.Process() print(f"Memory: {proc.memory_info().rss / 1024 / 1024:.1f} MB")Switch to a quantized index. Replace IndexFlat with IndexIVFPQ to compress vectors, cutting RAM by 4–16x.
import faiss d = 768 nlist = 4096 m_pq = 96 quantizer = faiss.IndexFlatL2(d) index = faiss.IndexIVFPQ(quantizer, d, nlist, m_pq, 8) training = np.random.rand(nlist * 50, d).astype("float32") index.train(training) print(f"Trained: {index.is_trained}")Tune nprobe for target recall.
for np_ in [8, 16, 32, 64]: index.nprobe = np_ print(f"nprobe={np_}")Use GPU acceleration when available.
try: res = faiss.StandardGpuResources() gpu_index = faiss.index_cpu_to_gpu(res, 0, index) print("GPU index ready") except Exception as e: print(f"GPU not available: {e}")
Verification
python3 -c "import faiss; res = faiss.StandardGpuResources(); print('GPU OK')"
# Expected: GPU OK
Common failures
- Training set too small for IVF-PQ. At least 30 * nlist training vectors are required.
- Memory overflow from uncompressed index. Use PQ compression to keep memory under control.
- Low recall after switching to PQ. Increase
n_centroidsor raisenlistandnprobe. - GPU transfer fails for large indexes. Entire index must fit in GPU VRAM. Split across devices.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
RELATED GUIDES