HOW-TO · RAG

How to Enable ChromaDB Persistence for Production

intermediate15 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

ChromaDB running, persistent storage available

What this does

In-memory ChromaDB instances lose all data when the process exits. For production deployments, this guide explains how to configure disk-based persistence, ensure safe shutdown, manage backup, and handle permission issues that commonly arise in server environments.

Steps

  1. Create a persistent client pointing to a dedicated directory.

    import chromadb, os
    
    storage_path = "/opt/chromadb/data"
    os.makedirs(storage_path, exist_ok=True)
    
    client = chromadb.PersistentClient(path=storage_path)
    print("Persistence enabled at:", storage_path)
    
  2. Add data and confirm it survives a process restart.

    col = client.get_or_create_collection("production_kb")
    col.add(
        ids=["prod-1"],
        documents=["Production RAG pipeline ready for traffic."],
        metadatas=[{"env": "prod", "version": "1.0"}]
    )
    print("Documents persisted:", col.count())
    
  3. Wrap the client in a singleton to avoid multiple instances.

    _client = None
    
    def get_chroma_client():
        global _client
        if _client is None:
            _client = chromadb.PersistentClient(path="/opt/chromadb/data")
        return _client
    
  4. Perform a graceful shutdown test. Start the script, stop it, and restart - data should remain.

    # Graceful shutdown: ensure ChromaDB writes flush before exit
    import atexit
    
    def shutdown_hook():
        # PersistentClient auto-flushes; explicit sync is not needed
        print("ChromaDB client shutting down gracefully.")
    
    atexit.register(shutdown_hook)
    
  5. Configure file permissions for the storage directory.

    sudo mkdir -p /opt/chromadb/data
    sudo chown -R $(whoami):$(id -gn) /opt/chromadb/data
    chmod 755 /opt/chromadb/data
    

Verification

python3 -c "
import chromadb
c = chromadb.PersistentClient(path='/tmp/persistence_test')
col = c.get_or_create_collection('persist')
col.add(ids=['x'], documents=['survives restart'])
del c
c2 = chromadb.PersistentClient(path='/tmp/persistence_test')
col2 = c2.get_collection('persist')
print('After restart:', col2.count(), 'docs')
c2.delete_collection('persist')
"
# Expected: After restart: 1 docs

Common failures

  • Data loss on unclean exit. ChromaDB's PersistentClient flushes on every write, but killing the process with SIGKILL can leave the write-ahead log in an inconsistent state. Use SIGTERM or pkill -15 for graceful shutdown.
  • Storage path owned by root. If the directory is owned by root, a non-root process cannot write, causing silent failures or permission errors at query time. Fix with chown as shown above.
  • Disk full. When disk space is exhausted, ChromaDB cannot flush writes. Monitor with df -h /opt/chromadb and alert at 80% usage.
  • Concurrent write from multiple processes. Two processes writing to the same persistent directory cause lock contention. Use a single writer process and route reads through it, or switch to client-server mode.
  • Path not absolute. Relative paths like ./data work in development but resolve differently depending on the working directory in production. Always use absolute paths.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

RELATED GUIDES