What this does

This guide captures the full conversational context window — the accumulated messages, tool outputs, and system instructions sent to the model — at each step of an AI agent's execution loop. Operators can replay a specific conversation step with the exact context the model received, enabling root-cause analysis of incorrect decisions, hallucinated responses, or unexpected refusals.

Steps

Identify the context window variable in the agent code. Typically this is a list passed into the model call:
```
messages = [{"role": "system", "content": system_prompt}] + conversation_history + tool_results
```

Before each model call, snapshot the entire message list into a structured log:

logger.info("context_window_sent", extra={
    "correlation_id": ctx.correlation_id,
    "step": step_number,
    "message_count": len(messages),
    "total_chars": sum(len(m["content"]) for m in messages),
    "roles": [m["role"] for m in messages],
    "messages": messages,
})

Configure the JSON logger with a dedicated messages field. Use python-json-logger with reserved_attrs to ensure the large messages array renders correctly.

Enable context window truncation in the log to control size. Snapshot only the last N messages or limit total characters:

MAX_LOG_CHARS = 5000
truncated = []
chars = 0
for m in reversed(messages):
    if chars > MAX_LOG_CHARS:
        break
    truncated.insert(0, m)
    chars += len(m.get("content", ""))
logger.info("context_window_sent", extra={"messages": truncated})

Write a replay helper script that reads a logged context window and replays it:

import json, sys
log_entry = json.loads(sys.stdin.read())
response = model.chat(log_entry["messages"])
print(f"Original output: {log_entry.get('output', 'N/A')}")
print(f"Replay output: {response}")

Save as replay_context.py and use with cat agent.log | jq 'select(.step==3)' | python replay_context.py.

Add context diff logging to show what changed between steps:

prev_msgs = previous_context.get("messages", [])
new_indices = range(len(prev_msgs), len(messages))
logger.info("context_diff", extra={"added_messages": new_indices, "added_count": len(new_indices)})

Verification

cat agent.log | jq 'select(.message == "context_window_sent") | {step: .step, msg_count: .message_count}' | head -3

Expected output: three JSON lines showing step numbers and corresponding message counts.

Common failures

Log lines are truncated — the entire messages array may exceed the logger's max line length. Use a dedicated log file appender with no line-length limit, or enable the truncation logic from Step 4.
Massive disk usage — conversational context windows grow quadratically with turn count. Apply aggressive truncation and set a daily log rotation policy: logrotate with maxsize 500M and rotate 7.
Sensitive data in logs — user messages may contain PII or secrets. Redact before logging using a pattern filter: re.sub(r'\b\d{16}\b', '[REDACTED]', content).

How to log conversational context windows for AI agent debugging

What this does

Steps

Verification

Common failures

Related guides