04. Event Queue with Kafka
Kafka is the backbone of production RAG ingestion pipelines. It provides durability (messages survive broker restarts), replay (consumers can re-read old messages), and fan-out (multiple consumers process the same event).
Design topics around document lifecycle stages: documents.parsed, chunks.embedded, vectors.indexed, documents.failed. Each topic represents a state transition.
# Kafka topic creation with retention for replay
kafka-topics.sh --create \
--topic documents.parsed \
--bootstrap-server kafka:9092 \
--partitions 12 \
--replication-factor 3 \
--config retention.ms=604800000 \
--config cleanup.policy=delete
Partitioning strategy determines throughput and ordering guarantees. Partition by document ID for in-order processing of the same document. Partition by tenant for multi-tenant isolation. Partition by priority for differentiated SLAs (legal documents processed before marketing materials).
Consumer groups enable horizontal scaling. Add consumers to a group and Kafka automatically distributes partitions. But beware: if a consumer holds locks on partitions while processing slowly, you create processing bottlenecks.
# Slow consumer problem
class EmbeddingConsumer:
def consume(self, messages):
for msg in messages:
# This blocks the partition
result = self.embed_single_chunk(msg.value)
# If GPU is contended, this takes 5+ seconds
# Kafka thinks this consumer is slow
Offset management is tricky. Auto-commit commits offsets after session.timeout.ms, which can lose messages if the consumer crashes between processing and commit. Manual offset management (commit after successful processing) provides at-least-once delivery but requires idempotent processing.
Dead letter queues catch poison messages. When a document fails parsing 3 times, send it to documents.dlq for manual review. Otherwise, a corrupted PDF can halt your entire ingestion pipeline.
Schema evolution needs governance. If you change the ChunkEvent schema to add a language field, old consumers fail deserializing new messages. Use Avro or Protobuf with backward-compatible schema changes.
Failure modes include consumer lag accumulation (ingestion outpaces processing), duplicate processing (consumer restarts and re-reads uncommitted offsets), partition imbalance (hot partitions receive 10x more messages), and broker disk exhaustion from uncommitted consumer groups.
Design a Kafka consumer for the chunks.embedded topic that processes 10,000 embeddings per minute while maintaining ordering per-document and handling GPU failures gracefully.