What this does

At one million vectors and beyond, the choice of index algorithm, quantization settings, and hardware allocation determines whether queries return in milliseconds or seconds. This guide covers proven tuning strategies for production deployments handling high-throughput retrieval workloads.

Steps

Profile the current index.

import time, random, numpy as np, psutil

def profile_search(index, vectors, k=5, runs=100):
    latencies = []
    for _ in range(runs):
        q = vectors[random.randint(0, len(vectors)-1):random.randint(0, len(vectors)-1)]
        start = time.perf_counter()
        index.search(q, k)
        latencies.append(time.perf_counter() - start)
    return np.mean(latencies) * 1000

proc = psutil.Process()
print(f"Memory: {proc.memory_info().rss / 1024 / 1024:.1f} MB")

Switch to a quantized index. Replace IndexFlat with IndexIVFPQ to compress vectors, cutting RAM by 4–16x.

import faiss
d = 768
nlist = 4096
m_pq = 96

quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(quantizer, d, nlist, m_pq, 8)
training = np.random.rand(nlist * 50, d).astype("float32")
index.train(training)
print(f"Trained: {index.is_trained}")

Tune nprobe for target recall.

for np_ in [8, 16, 32, 64]:
    index.nprobe = np_
    print(f"nprobe={np_}")

Use GPU acceleration when available.

try:
    res = faiss.StandardGpuResources()
    gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
    print("GPU index ready")
except Exception as e:
    print(f"GPU not available: {e}")

Verification

python3 -c "import faiss; res = faiss.StandardGpuResources(); print('GPU OK')"
# Expected: GPU OK

Common failures

Training set too small for IVF-PQ. At least 30 * nlist training vectors are required.
Memory overflow from uncompressed index. Use PQ compression to keep memory under control.
Low recall after switching to PQ. Increase n_centroids or raise nlist and nprobe.
GPU transfer fails for large indexes. Entire index must fit in GPU VRAM. Split across devices.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

How to Optimize FAISS Index for Large Datasets

What this does

Steps

Verification

Common failures

Related guides