RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to set up agent memory with vector databases
HOW-TO · SUP

How to set up agent memory with vector databases

intermediate·25 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Vector database (ChromaDB/Faiss), agent framework

What this does

Setting up agent memory with vector databases gives AI agents persistent, semantic memory that persists across sessions. Instead of relying solely on the conversation window (which has limited context length), the agent stores past interactions, facts, and user preferences as vector embeddings in a database. When a new query arrives, the agent retrieves semantically similar past memories and includes them in the context. This enables the agent to recall previous conversations, maintain user-specific knowledge, and build a growing knowledge base.

Steps

Install dependencies: pip install chromadb langchain-ollama. Initialize the vector database client: import chromadb; client = chromadb.PersistentClient(path="./agent_memory"). Create a collection: collection = client.get_or_create_collection(name="agent_memories", metadata={"hnsw:space": "cosine"}). Set up the embedding function: from langchain_ollama import OllamaEmbeddings; embeddings = OllamaEmbeddings(model="nomic-embed-text"). Implement the memory manager class with two operations. Store: def store_memory(session_id: str, content: str, metadata: dict): embedding = embeddings.embed_query(content); collection.add(documents=[content], embeddings=[embedding], metadatas=[metadata], ids=[f"{session_id}_{uuid4()}"]). Retrieve: def retrieve_memories(session_id: str, query: str, k: int = 5): embedding = embeddings.embed_query(query); results = collection.query(query_embeddings=[embedding], n_results=k, where={"session_id": session_id}); return results["documents"][0]. Integrate into the agent's processing pipeline. Before calling the LLM, retrieve relevant memories: past_context = memory_manager.retrieve_memories(session_id, user_query). Format memories as a prefix to the system prompt: system_prompt = f"Past relevant context:\n{chr(10).join(past_context)}\n\nCurrent conversation:\n". After the agent generates a response, store the interaction: memory_manager.store_memory(session_id, f"User: {user_query}\nAssistant: {response}", {"timestamp": datetime.now().isoformat(), "type": "conversation"}). For user facts and preferences, store them separately with {"type": "user_fact"} metadata and always include them regardless of query similarity. Implement memory cleanup: periodically delete memories older than 90 days or trim the collection when it exceeds 10,000 entries.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Send a query containing specific information: "My favorite programming language is Rust." In a new session, send "What is my favorite programming language?" and verify the agent retrieves and references the stored fact. Check the vector database directly: collection.count() should return a positive integer. Query the collection with collection.peek(limit=5) and verify stored documents contain correct content and metadata. Test retrieval relevance: store 10 diverse memories, then query with a specific topic—the top results should be semantically related.

Common failures

Embedding dimension mismatch: The embedding model produces vectors of a fixed dimension (e.g., 768 for nomic-embed-text); ensure the collection is created with matching dimension or was auto-configured on first add. Stale memory pollution: Old or incorrect memories degrade retrieval quality—implement recency weighting or a maximum age filter in the retrieval query. Empty retrieval results: Check that the session_id where filter is correct; for global memories, remove the filter temporarily. Storage growth unbounded: Implement TTL-based cleanup with a cron job that runs collection.delete(ids=expired_ids). Embedding call latency slows down conversations: Batch memory storage to run asynchronously after the response is sent; use asyncio.create_task() for non-blocking writes.

  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • build-langgraph-agent-scratch
  • build-rag-evaluation-pipeline
  • build-code-generation-agent-local-models
← All how-to guidesCourses →