RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Large language models / Vector Database
Large language models

Vector Database

A vector database stores and retrieves data as high-dimensional vectors (embeddings) rather than rows or documents. In local AI, it enables semantic search: instead of matching keywords, it finds items whose embeddings are closest to a query embedding, using approximate nearest neighbor (ANN) algorithms. Operators encounter vector databases when building RAG (Retrieval-Augmented Generation) pipelines—they index document chunks as vectors, then retrieve relevant chunks for a language model to answer questions. Popular choices include Chroma, FAISS, and Qdrant, all runnable on local hardware.

Deeper dive

Vector databases are designed for similarity search on embeddings—numerical lists (e.g., 768 or 1536 dimensions) that capture semantic meaning. They index vectors using ANN methods like HNSW (Hierarchical Navigable Small World) or IVF (Inverted File Index) to trade a small accuracy loss for massive speed gains over brute-force search. For local operators, the key constraint is memory: storing millions of vectors at 1536 dimensions each can consume gigabytes of RAM. Most vector databases support on-disk storage with memory-mapping to reduce RAM pressure. They integrate with embedding models (e.g., all-MiniLM-L6-v2 or nomic-embed-text-v1.5) that run locally via ONNX or llama.cpp. In a typical RAG workflow, documents are split into chunks, each chunk is embedded, and the embedding is stored in the vector DB. At query time, the query is embedded, the DB returns the top-k nearest chunks, and those chunks are fed into the LLM as context.

Practical example

A local RAG app indexes 10,000 PDF pages. Each page is embedded into a 384-dimensional vector using all-MiniLM-L6-v2 (~0.1 GB RAM for the model). The vector database (Chroma) stores these 10,000 vectors in a SQLite-backed index, consuming ~15 MB on disk. Querying for "budget forecast" returns the top-5 nearest pages in under 50 ms on a CPU, even without GPU acceleration.

Workflow example

In Ollama, you can run ollama pull nomic-embed-text to get an embedding model, then use a Python script with Chroma: chromadb.Client().create_collection("docs") and collection.add(embeddings=..., documents=...). At query time, embed the question with the same model, call collection.query(query_embeddings=[...], n_results=5), and pass the returned documents to ollama run llama3.1 as context. The whole pipeline runs locally with no cloud dependency.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →