How to log conversational context windows for AI agent debugging
Agent with conversation loop, logging system
What this does
This guide captures the full conversational context window — the accumulated messages, tool outputs, and system instructions sent to the model — at each step of an AI agent's execution loop. Operators can replay a specific conversation step with the exact context the model received, enabling root-cause analysis of incorrect decisions, hallucinated responses, or unexpected refusals.
Steps
Identify the context window variable in the agent code. Typically this is a list passed into the model call:
messages = [{"role": "system", "content": system_prompt}] + conversation_history + tool_resultsBefore each model call, snapshot the entire message list into a structured log:
logger.info("context_window_sent", extra={ "correlation_id": ctx.correlation_id, "step": step_number, "message_count": len(messages), "total_chars": sum(len(m["content"]) for m in messages), "roles": [m["role"] for m in messages], "messages": messages, })Configure the JSON logger with a dedicated
messagesfield. Usepython-json-loggerwithreserved_attrsto ensure the largemessagesarray renders correctly.Enable context window truncation in the log to control size. Snapshot only the last N messages or limit total characters:
MAX_LOG_CHARS = 5000 truncated = [] chars = 0 for m in reversed(messages): if chars > MAX_LOG_CHARS: break truncated.insert(0, m) chars += len(m.get("content", "")) logger.info("context_window_sent", extra={"messages": truncated})Write a replay helper script that reads a logged context window and replays it:
import json, sys log_entry = json.loads(sys.stdin.read()) response = model.chat(log_entry["messages"]) print(f"Original output: {log_entry.get('output', 'N/A')}") print(f"Replay output: {response}")Save as
replay_context.pyand use withcat agent.log | jq 'select(.step==3)' | python replay_context.py.Add context diff logging to show what changed between steps:
prev_msgs = previous_context.get("messages", []) new_indices = range(len(prev_msgs), len(messages)) logger.info("context_diff", extra={"added_messages": new_indices, "added_count": len(new_indices)})
Verification
cat agent.log | jq 'select(.message == "context_window_sent") | {step: .step, msg_count: .message_count}' | head -3
Expected output: three JSON lines showing step numbers and corresponding message counts.
Common failures
- Log lines are truncated — the entire
messagesarray may exceed the logger's max line length. Use a dedicated log file appender with no line-length limit, or enable the truncation logic from Step 4. - Massive disk usage — conversational context windows grow quadratically with turn count. Apply aggressive truncation and set a daily log rotation policy:
logrotatewithmaxsize 500Mandrotate 7. - Sensitive data in logs — user messages may contain PII or secrets. Redact before logging using a pattern filter:
re.sub(r'\b\d{16}\b', '[REDACTED]', content).