11. Memory: Vector Store

Chapter 11 of 18 · 15 min

Vector store memory retrieves relevant past context using semantic similarity. Unlike summary memory which compresses everything, vector memory fetches only what relates to the current query. This is the same principle as RAG applied to conversation history.

LangChain's VectorStoreRetrieverMemory indexes your conversation history into a vector database and retrieves relevant snippets at query time.

from langchain.memory import VectorStoreRetrieverMemory
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text", base_url="http://localhost:11434")
vectorstore = Chroma(embedding_function=embeddings, persist_directory="./memory_db")

memory = VectorStoreRetrieverMemory(
    vectorstore=vectorstore,
    k=3,  # Retrieve 3 most relevant memories
    search_kwargs={"k": 3}
)

memory.save_context(
    {"input": "User mentioned they prefer JSON output"},
    {"output": "Noted. I'll format responses as JSON."}
)
memory.save_context(
    {"input": "The pipeline failed with OOM error"},
    {"output": "Reduced batch size from 32 to 8."}
)

# Query for relevant memories
relevant = memory.load_memory_variables(
    {"input": "How should I format my API responses?"}
)
print(relevant["history"])

The k parameter controls how many memories get retrieved. Values between 2-5 work well for most use cases. Higher values increase context size and inference cost.

A common failure: forgetting that vector store memory does not include the current turn's context automatically. You must explicitly pass retrieved memories to your prompt template.

from langchain.prompts import PromptTemplate

template = PromptTemplate.from_template("""
Previous context:
{history}

Current question: {input}
""")

# This chain correctly incorporates retrieved memory
from langchain.chains import LLMChain
chain = LLMChain(llm=llm, prompt=template)

Vector store memory shines when users revisit topics across sessions. The retrieval step finds relevant past exchanges even when the wording differs.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Create two memories with different topics, then query the vector store with a phrase matching only one. Verify that the retrieval returns only the relevant memory.