HOW-TO · RAG
How to Use FAISS for Approximate Nearest Neighbor Search
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
FAISS installed, embeddings indexed
What this does
Approximate nearest neighbor (ANN) search is the core operation in retrieval-augmented generation. Instead of brute-force comparison against every stored vector, ANN algorithms prune the search space by exploiting geometric structure. FAISS provides several ANN implementations; this guide focuses on using them for production-grade semantic search.
Steps
Build and run a similarity search.
import faiss import numpy as np import ollama d = 768 k = 5 index = faiss.IndexFlatL2(d) sample_vectors = np.random.rand(10000, d).astype("float32") faiss.normalize_L2(sample_vectors) index.add(sample_vectors) def search_similar(query_text, index, k=5): resp = ollama.embeddings(model="nomic-embed-text", prompt=query_text) query_vec = np.array([resp["embedding"]], dtype="float32") faiss.normalize_L2(query_vec) distances, labels = index.search(query_vec, k) return labels[0], distances[0] labels, distances = search_similar("What is RAG?", index, k=k) for rank, (label, dist) in enumerate(zip(labels, distances)): print(f"Rank {rank+1}: ID={label}, L2 distance={dist:.4f}")Retrieve original texts from results.
corpus = ["FAISS is an efficient similarity search library."] * 10000 labels, _ = search_similar("How does similarity search work?", index, k=3) for label in labels: print(f"[{label}] {corpus[label][:60]}")Measure recall against ground truth.
flat_index = faiss.IndexFlatL2(d) flat_index.add(sample_vectors) gt_labels, _ = flat_index.search(query_vec, k) recall = len(set(labels) & set(gt_labels[0])) / k print(f"Recall@{k}: {recall:.3f}")
- Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Verification
python3 -c "
import faiss, numpy as np
d = 128; idx = faiss.IndexFlatL2(d)
v = np.random.rand(100, d).astype('float32'); idx.add(v)
q = np.random.rand(1, d).astype('float32')
D, I = idx.search(q, 5)
print(f'Search OK, top ID: {I[0][0]}')
"
# Expected: Search OK, top ID: <int>
Common failures
- Dimension mismatch between query and index. Use the same embedding model and verify dimension consistency.
- Search returns -1 as label. Indicates fewer than k neighbors found or empty index.
- Distances are unexpectedly large. Normalize both indexed vectors and query vectors for proper cosine similarity.
- Slow queries on large indexes. Switch to IVF or HNSW index for datasets exceeding 50,000 vectors.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
RELATED GUIDES