RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Vector Stores and Embeddings
  6. /Ch. 7
Vector Stores and Embeddings

07. Metadata Filtering

Chapter 7 of 18 · 20 min
KEY INSIGHT

Pre-filter documents by metadata before similarity search to scope results to relevant subsets. ChromaDB supports `where` filtering to restrict queries to documents matching specific metadata criteria. The filter runs before similarity search, narrowing the candidate set. ```python import chromadb client = chromadb.PersistentClient(path="./chroma_db") collection = client.get_or_create_collection( name="knowledge_base", embedding_function=SentenceTransformer('all-MiniLM-L6-v2') ) # Add documents with various metadata collection.add( documents=[ "How to install Python 3.11 on Ubuntu", "Python installation guide for Windows", "Docker container setup tutorial", "Kubernetes deployment best practices", "React component lifecycle explained", "Building REST APIs with FastAPI" ], ids=["p1", "p2", "d1", "k1", "r1", "f1"], metadatas=[ {"category": "python", "difficulty": "beginner", "rating": 4.5}, {"category": "python", "difficulty": "beginner", "rating": 4.2}, {"category": "devops", "difficulty": "intermediate", "rating": 4.8}, {"category": "devops", "difficulty": "advanced", "rating": 4.6}, {"category": "frontend", "difficulty": "intermediate", "rating": 4.3}, {"category": "backend", "difficulty": "intermediate", "rating": 4.7} ] ) # Filter by single metadata field results = collection.query( query_texts=["containers and deployment"], n_results=3, where={"category": "devops"} # Only search devops documents ) print("DevOps results:") for doc in results['documents'][0]: print(f" - {doc}") ``` Output: ``` DevOps results: - Docker container setup tutorial - Kubernetes deployment best practices ``` Compound filters use operators `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`: ```python # Filter by category AND difficulty results = collection.query( query_texts=["programming tutorials"], n_results=3, where={ "category": "python", "difficulty": {"$gte": "intermediate"} # difficulty >= "intermediate" } ) # Filter with OR logic using $or results = collection.query( query_texts=["tutorials"], n_results=5, where={ "$or": [ {"category": {"$eq": "python"}}, {"category": {"$eq": "frontend"}} ] } ) ``` Metadata filtering is effective but has limits. ChromaDB loads all matching documents into memory before vector search. For large-scale filtering (millions of documents), consider segmenting into separate collections per category.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Create a collection with 15+ documents spanning at least 3 categories. Write queries that filter by single metadata, compound metadata, and verify results match your filter criteria.

← Chapter 6
Similarity Search
Chapter 8 →
FAISS Installation