RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /LangChain for Local AI
  6. /Ch. 17
LangChain for Local AI

17. LangChain Callbacks

Chapter 17 of 18 · 20 min
KEY INSIGHT

Callbacks intercept chain execution events without modifying chain logic—ideal for production observability and debugging.

Callbacks intercept events during chain execution—useful for logging, monitoring, timing, and debugging. LangChain's callback system fires events at predefined points: chain start/end, LLM start/end, retrieval events, and errors.

Create a custom callback handler.

from langchain_core.callbacks import BaseCallbackHandler
from langchain.schema import AgentAction, AgentFinish

class TimingCallback(BaseCallbackHandler):
    def __init__(self):
        self.tokens = 0
        self.start_time = None
    
    def on_llm_start(self, serialized, prompts, **kwargs):
        self.start_time = time.time()
        print(f"LLM started at {datetime.now()}")
    
    def on_llm_end(self, response, **kwargs):
        elapsed = time.time() - self.start_time
        print(f"LLM finished in {elapsed:.2f}s")
    
    def on_chain_start(self, serialized, inputs, **kwargs):
        print(f"Chain started with {len(inputs)} inputs")
    
    def on_chain_end(self, outputs, **kwargs):
        print(f"Chain output keys: {list(outputs.keys())}")

from datetime import datetime
import time

callback = TimingCallback()

Attach callbacks at chain creation or invocation time.

# At chain creation
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    callbacks=[callback]  # Attach here
)

# At invocation (takes precedence)
result = qa_chain.invoke({"query": "..."}, callbacks=[callback])

For token counting, inspect the LLM output in on_llm_end.

class TokenCounterCallback(BaseCallbackHandler):
    def on_llm_end(self, response, **kwargs):
        if hasattr(response, "llm_output") and response.llm_output:
            token_usage = response.llm_output.get("token_usage", {})
            print(f"Tokens used: {token_usage}")

LangChain also provides built-in handlers: StdOutCallbackHandler for verbose output, FileCallbackHandler for file logging, and LangchainTracer for LangSmith integration.

from langchain_core.callbacks import StdOutCallbackHandler

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    callbacks=[StdOutCallbackHandler()]  # Verbose output
)

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement a callback that tracks retrieval latency separately from LLM latency for a RAG chain. Print the ratio for 5 different queries.

← Chapter 16
Streaming with LangChain
Chapter 18 →
LangChain Evaluation