What this does

Vector search retrieves semantically similar results from a collection of embeddings. Metadata filtering narrows results to a specific subset — for example, documents authored by a particular user or published within a date range. Most vector databases (pgvector, Qdrant, Weaviate) support combining semantic similarity with structured filters.

Steps

Step 1 — Ensure documents have indexed metadata fields.

Before loading documents, assign each a set of metadata key-value pairs. Common fields include author, created_at, category, tags, and document_id. Verify these fields are indexed in the vector database. In pgvector, add TIMESTAMPTZ and VARCHAR columns alongside the vector column and create appropriate indexes.

Step 2 — Generate and store embeddings.

Use an embedding model to convert each document's text into a dense vector. Store the vector in the database alongside its metadata. For pgvector, use INSERT with pgvector syntax; for Qdrant, use the upsert API. Confirm the document count in the database matches the source corpus.

Step 3 — Construct a filtered query.

Build a query that includes two components: a vector (query_vector) for semantic similarity and a metadata filter object. For example, in Qdrant a filter might specify must: [{ key: "author", match: { value: "jane_doe" }}]. In pgvector, construct a WHERE clause on metadata columns before running the similarity search.

Step 4 — Combine filter and vector search in a single query.

Most vector databases evaluate the filter first, then compute similarity only within the filtered subset. Some (pgvector) require a subquery or lateral join to apply the filter before the distance calculation. Test both patterns and verify the query plan uses the metadata index.

Step 5 — Tune the recall/precision tradeoff.

If the filter returns very few documents, similarity scores may be unreliable. If it returns too many, the query slows. Adjust limit and score_threshold parameters to balance speed and result quality. Target a recall rate above 90% by spot-checking results against known relevant documents.

Step 6 — Log and measure query performance.

Track query latency, filter selectivity (number of documents matching the filter vs. total), and the top-K recall rate. A filter that removes 99% of the corpus should show a measurable latency drop.

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.
Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.
Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Query with author="jane_doe" and no semantic filter. Confirm only documents authored by jane_doe are returned.
Query with the same author filter plus a semantic vector. Confirm results are semantically relevant and all belong to jane_doe.
Query with a contradictory filter (e.g., a non-existent author). Confirm zero results and the query returns in under 200 ms.

Common failures

Filter applied after vector search: Some implementations compute top-K across the entire corpus first and apply the filter to the result set. This defeats the purpose of filtering. Verify the database executes the filter before the similarity search.
Missing index on metadata column: Without an index, the database performs a full scan on the metadata column. Always index commonly filtered fields.
Type mismatches in filter values: If created_at is stored as a string but the filter compares against a timestamp, the filter silently returns zero results. Verify type consistency between storage and query.

Related guides

How to Set Up Multi-Modal RAG with Images — extends filtering to multi-modal embeddings
How to Set Up Model Fallback Chains (Local to Cloud) — provides a reliable embedding endpoint for indexing