HOW-TO · SUP

How to Implement Vector Search with Metadata Filtering

intermediate20 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Vector database with filtering support, indexed documents

What this does

Vector search retrieves semantically similar results from a collection of embeddings. Metadata filtering narrows results to a specific subset — for example, documents authored by a particular user or published within a date range. Most vector databases (pgvector, Qdrant, Weaviate) support combining semantic similarity with structured filters.

Steps

Step 1 — Ensure documents have indexed metadata fields.

Before loading documents, assign each a set of metadata key-value pairs. Common fields include author, created_at, category, tags, and document_id. Verify these fields are indexed in the vector database. In pgvector, add TIMESTAMPTZ and VARCHAR columns alongside the vector column and create appropriate indexes.

Step 2 — Generate and store embeddings.

Use an embedding model to convert each document's text into a dense vector. Store the vector in the database alongside its metadata. For pgvector, use INSERT with pgvector syntax; for Qdrant, use the upsert API. Confirm the document count in the database matches the source corpus.

Step 3 — Construct a filtered query.

Build a query that includes two components: a vector (query_vector) for semantic similarity and a metadata filter object. For example, in Qdrant a filter might specify must: [{ key: "author", match: { value: "jane_doe" }}]. In pgvector, construct a WHERE clause on metadata columns before running the similarity search.

Step 4 — Combine filter and vector search in a single query.

Most vector databases evaluate the filter first, then compute similarity only within the filtered subset. Some (pgvector) require a subquery or lateral join to apply the filter before the distance calculation. Test both patterns and verify the query plan uses the metadata index.

Step 5 — Tune the recall/precision tradeoff.

If the filter returns very few documents, similarity scores may be unreliable. If it returns too many, the query slows. Adjust limit and score_threshold parameters to balance speed and result quality. Target a recall rate above 90% by spot-checking results against known relevant documents.

Step 6 — Log and measure query performance.

Track query latency, filter selectivity (number of documents matching the filter vs. total), and the top-K recall rate. A filter that removes 99% of the corpus should show a measurable latency drop.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

  • Query with author="jane_doe" and no semantic filter. Confirm only documents authored by jane_doe are returned.
  • Query with the same author filter plus a semantic vector. Confirm results are semantically relevant and all belong to jane_doe.
  • Query with a contradictory filter (e.g., a non-existent author). Confirm zero results and the query returns in under 200 ms.

Common failures

  • Filter applied after vector search: Some implementations compute top-K across the entire corpus first and apply the filter to the result set. This defeats the purpose of filtering. Verify the database executes the filter before the similarity search.
  • Missing index on metadata column: Without an index, the database performs a full scan on the metadata column. Always index commonly filtered fields.
  • Type mismatches in filter values: If created_at is stored as a string but the filter compares against a timestamp, the filter silently returns zero results. Verify type consistency between storage and query.

Related guides

  • How to Set Up Multi-Modal RAG with Images — extends filtering to multi-modal embeddings
  • How to Set Up Model Fallback Chains (Local to Cloud) — provides a reliable embedding endpoint for indexing