HOW-TO · RAG

How to Use ChromaDB with LangChain for Vector Storage

intermediate15 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

ChromaDB and LangChain installed

What this does

LangChain provides a high-level abstraction over ChromaDB through its Chroma vector store integration. This guide shows how to initialize the vector store, configure an Ollama-backed embedding model, load documents, and run a similarity search - all within a LangChain pipeline.

Steps

  1. Create the ChromaDB vector store with an Ollama embedding function.

    from langchain_ollama import OllamaEmbeddings
    from langchain.vectorstores import Chroma
    
    embed_model = OllamaEmbeddings(model="mxbai-embed-large")
    
    vectorstore = Chroma(
        collection_name="langchain_docs",
        embedding_function=embed_model,
        persist_directory="./langchain_chroma"
    )
    print("Vector store created:", vectorstore._collection.name)
    
  2. Add documents to the vector store. Use text splitting for better retrieval granularity.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    docs = [
        "ChromaDB is an open-source vector database.",
        "LangChain connects LLMs with external data sources.",
        "Ollama runs large language models locally."
    ]
    
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
    split_docs = text_splitter.create_documents(docs)
    
    vectorstore.add_documents(split_docs)
    print("Documents added:", vectorstore._collection.count())
    
  3. Perform similarity search.

    results = vectorstore.similarity_search("What is ChromaDB?", k=2)
    for doc in results:
        print("-", doc.page_content)
    
  4. Build a RAG chain with a retrievalQA node.

    from langchain_ollama import ChatOllama
    from langchain.chains import RetrievalQA
    
    llm = ChatOllama(model="llama3.2")
    qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever())
    answer = qa_chain.invoke("How does ChromaDB integrate with LangChain?")
    print(answer["result"])
    

Verification

python3 -c "
from langchain.vectorstores import Chroma
from langchain_ollama import OllamaEmbeddings
embed = OllamaEmbeddings(model='mxbai-embed-large')
vs = Chroma(embedding_function=embed, persist_directory='/tmp/lc_test', collection_name='t')
vs.add_texts(['hello world'])
print('Search result:', vs.similarity_search('hello', k=1)[0].page_content)
"
# Expected: Search result: hello world

Common failures

  • Ollama server not running. LangChain's OllamaEmbeddings sends HTTP requests to localhost:11434. Start Ollama with ollama serve before running the script.
  • Embedding model mismatch. Using mxbai-embed-large in LangChain but a different model in ChromaDB causes embedding dimension mismatches. Ensure the same model name is used everywhere.
  • Missing langchain-ollama package. The integration is in a separate package. Install it explicitly with pip install langchain-ollama.
  • Persist directory locked. Opening the same persist directory from two processes simultaneously causes a lock error. Always use a single writer process.
  • Wrong import path. LangChain moved Ollama integrations to langchain_ollama. The old langchain.embeddings path may not include Ollama. Use the explicit langchain_ollama import.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

RELATED GUIDES