HOW-TO · INF
How to run embedding models for semantic search
PREREQUISITES
Ollama installed, embedding model pulled
What this does
Embedding models convert text into dense vector representations. These vectors enable semantic search, clustering, and retrieval-augmented generation (RAG) by measuring cosine similarity between queries and documents.
Steps
Generate an embedding for a text query.
curl -s http://localhost:11434/api/embeddings \ -d '{"model": "all-minilm", "prompt": "What is machine learning?"}' \ | jq '.embedding | length'Expected: Returns a vector of length 384 (all-minilm) or 768 (nomic-embed-text).
Embed multiple documents and compute similarity scores.
import requests, numpy as np def embed(text): r = requests.post("http://localhost:11434/api/embeddings", json={"model": "all-minilm", "prompt": text}) return np.array(r.json()["embedding"]) docs = ["Python is a programming language", "Cats are mammals", "Machine learning uses algorithms"] doc_vecs = [embed(d) for d in docs] query_vec = embed("What language should I learn for AI?") scores = [np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv)) for dv in doc_vecs] best = docs[np.argmax(scores)] print(f"Best match: {best} (score: {max(scores):.3f})")Store embeddings in a vector database. Use ChromaDB for local persistence:
pip install chromadbimport chromadb client = chromadb.Client() collection = client.create_collection("my_docs") collection.add(embeddings=[embed(d).tolist() for d in docs], documents=docs, ids=[f"doc{i}" for i in range(len(docs))]) results = collection.query(query_embeddings=[query_vec.tolist()], n_results=1) print(results["documents"][0])
- Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Verification
python semantic_search.py
# Expected output: "Best match: Machine learning uses algorithms (score: 0.87)"
# The most semantically relevant document is returned first
Common failures
- Model not found: Verify the embedding model is pulled:
ollama list. Not all models support the/api/embeddingsendpoint. - Dimension mismatch: Ensure all vectors come from the same model. Cosine similarity requires same-dimensional vectors.
- Poor retrieval quality: Some embedding models are better for specific domains. See the comparison guide to select the right one.
Related guides
RELATED GUIDES