How to monitor agent token usage and cost
AI agent with API calls, cost tracking mechanism
What this does
Monitoring agent token usage and cost provides visibility into the operational expenses of running AI agents. The monitoring system tracks input tokens, output tokens, and cumulative costs per agent session, per task type, and across time periods. This data enables budget enforcement, cost optimization decisions, and chargeback accounting for multi-team AI deployments.
Steps
Begin by creating a TokenTracker class that captures token counts from each API response. In the agent's model call wrapper, extract response.usage.prompt_tokens and response.usage.completion_tokens. Store each record with a timestamp, session ID, task type, and model name. Calculate cost using a pricing lookup table: cost = (prompt_tokens * INPUT_PRICE + completion_tokens * OUTPUT_PRICE) / 1_000_000. Write records to a SQLite database with table token_usage (id, session_id, task_type, model, prompt_tokens, completion_tokens, cost, timestamp). Set a budget warning threshold—log a warning when daily cost exceeds $5: if daily_total > 5.0: logger.warning(f"Daily budget exceeded: ${daily_total}"). Create a reporting function that queries the database: SELECT task_type, SUM(cost) as total_cost, SUM(prompt_tokens) as total_prompt, SUM(completion_tokens) as total_completion FROM token_usage WHERE date(timestamp) = date('now') GROUP BY task_type. Optionally, expose metrics via a /metrics endpoint using the Prometheus client library.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.
Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.
Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Verification
Run the agent through 3-5 tasks and query the token_usage table: each task should produce at least one row with non-zero prompt and completion tokens. Verify cost calculation by hand: pick one row and compute (prompt_tokens * input_price + completion_tokens * output_price) / 1_000_000, compare with the stored cost. Trigger the budget warning by setting a low threshold temporarily and running a task. Check that the reporting function returns correct aggregated totals that match raw row counts.
Common failures
Token counts missing from API responses: Verify the API call includes stream=False or that the streamed response collects final usage metadata. Cost calculation errors due to wrong pricing: Double-check model pricing in the lookup table against the provider's current pricing page. Database connection errors under concurrent agent instances: Use a connection pool or switch to SQLite WAL mode with PRAGMA journal_mode=WAL. Timestamp timezone mismatches: Normalize all timestamps to UTC with datetime.utcnow(). Large token counts overflow integer fields: Use BIGINT for token columns and DECIMAL(10,6) for cost.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- setup-agent-observability-opentelemetry
- debug-ai-agent-loops-infinite-reasoning
- implement-rate-limiting-ai-apis