RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Vector Database Internals
  6. /Ch. 1
Vector Database Internals

01. Why Build a Vector DB?

Chapter 1 of 18 · 15 min
KEY INSIGHT

Vector databases exist because exact nearest neighbor search in high-dimensional space is computationally intractable at scale, and approximate methods trade recall for speed in ways that matter enormously in production. The core problem: given 10 million embeddings (from images, text, audio, or any deep learning model), find the k closest vectors to a query. A naive approach examines every vector—O(n) time. At 10M vectors with 768-dimensional float32 vectors, you're doing 7.68 billion float comparisons per query. That's approximately never acceptable. Approximate Nearest Neighbor (ANN) indexes solve this by accepting that you don't need the *exact* answer, just a *good enough* answer that's fast. The trade-off is parameterized—you control how much accuracy you sacrifice for speed. Three core techniques power modern vector databases: **Inverted File Index (IVF)** partitions your vector space into clusters. At query time, you find which cluster your query lands in and only search that cluster (plus a few neighbors). The parameter `nprobe` controls how many clusters you check—higher nprobe = higher recall = slower queries. **Hierarchical Navigable Small World (HNSW)** builds a multi-layer graph structure. Upper layers are sparse and let you jump toward your target region quickly. Lower layers are dense and refine the search. Think of it as a highway system for vectors. **Product Quantization (PQ)** compresses vectors by splitting them into subvectors and clustering each subvector independently. This reduces memory footprint by 10-100x and allows GPU-accelerated distance computation on compressed data.

You're probably here because someone told you to use a vector database, and you want to understand what actually happens when you call that add or search method. Smart move. The gap between "it works" and "I understand why it's slow" is where production incidents live.

EXERCISE

Install FAISS or usearch and index 100k random vectors. Run a search and note the latency. Then index 1M vectors and note how latency changes. You won't understand why yet—but you'll have a baseline.

# Minimal FAISS index creation
import faiss
import numpy as np

# Create 100k 128-dim vectors (pretend these are embeddings)
vectors = np.random.rand(100000, 128).astype('float32')

# Build a brute-force index to establish baseline
index = faiss.IndexFlatL2(vectors.shape[1])
index.add(vectors)

# Query
query = np.random.rand(1, 128).astype('float32')
distances, indices = index.search(query, k=10)
print(f"Top 10 indices: {indices[0]}")
print(f"Top 10 distances: {distances[0]}")
← Overview
Vector Database Internals
Chapter 2 →
Vector Search Fundamentals