17. LangChain Callbacks
Callbacks intercept events during chain execution—useful for logging, monitoring, timing, and debugging. LangChain's callback system fires events at predefined points: chain start/end, LLM start/end, retrieval events, and errors.
Create a custom callback handler.
from langchain_core.callbacks import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish
class TimingCallback(BaseCallbackHandler):
def __init__(self):
self.tokens = 0
self.start_time = None
def on_llm_start(self, serialized, prompts, **kwargs):
self.start_time = time.time()
print(f"LLM started at {datetime.now()}")
def on_llm_end(self, response, **kwargs):
elapsed = time.time() - self.start_time
print(f"LLM finished in {elapsed:.2f}s")
def on_chain_start(self, serialized, inputs, **kwargs):
print(f"Chain started with {len(inputs)} inputs")
def on_chain_end(self, outputs, **kwargs):
print(f"Chain output keys: {list(outputs.keys())}")
from datetime import datetime
import time
callback = TimingCallback()
Attach callbacks at chain creation or invocation time.
# At chain creation
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
callbacks=[callback] # Attach here
)
# At invocation (takes precedence)
result = qa_chain.invoke({"query": "..."}, callbacks=[callback])
For token counting, inspect the LLM output in on_llm_end.
class TokenCounterCallback(BaseCallbackHandler):
def on_llm_end(self, response, **kwargs):
if hasattr(response, "llm_output") and response.llm_output:
token_usage = response.llm_output.get("token_usage", {})
print(f"Tokens used: {token_usage}")
LangChain also provides built-in handlers: StdOutCallbackHandler for verbose output, FileCallbackHandler for file logging, and LangchainTracer for LangSmith integration.
from langchain_core.callbacks import StdOutCallbackHandler
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
callbacks=[StdOutCallbackHandler()] # Verbose output
)
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Implement a callback that tracks retrieval latency separately from LLM latency for a RAG chain. Print the ratio for 5 different queries.