The DocumentQASystem in Chapter 18 is production-ready for moderate workloads. For billions of documents, migrate to FAISS with IVF indexes or dedicated vector databases like Qdrant or Weaviate running as services.
Key files to keep:
# Your index directory (ChromaDB persists here)
./qa_index/
# Your embedding model cache (sentence-transformers)
~/.cache/huggingface/
# Backup before any destructive operations
./backup_YYYYMMDD_HHMMSS/
EXERCISE
Extend the DocumentQASystem with:
Document deletion support (delete_document(doc_id))
Update support (update_document(doc_id, new_text, new_metadata))
A bulk_search method that accepts multiple queries and returns results for all
Persistence of query history with timestamps
Run queries, verify results, and demonstrate all features work together as a cohesive system.
Summary
You now have a working semantic search system:
Embeddings convert text to 384-dimensional vectors that capture meaning
ChromaDB stores vectors with metadata and supports filtering
FAISS provides faster search for very large datasets
LangChain offers abstractions for swapping backends
Batch processing handles thousands of documents efficiently
The DocumentQASystem in Chapter 18 is production-ready for moderate workloads. For billions of documents, migrate to FAISS with IVF indexes or dedicated vector databases like Qdrant or Weaviate running as services.
Key files to keep:
# Your index directory (ChromaDB persists here)
./qa_index/
# Your embedding model cache (sentence-transformers)
~/.cache/huggingface/
# Backup before any destructive operations
./backup_YYYYMMDD_HHMMSS/