RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to monitor agent token usage and cost
HOW-TO · SUP

How to monitor agent token usage and cost

intermediate·20 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

AI agent with API calls, cost tracking mechanism

What this does

Monitoring agent token usage and cost provides visibility into the operational expenses of running AI agents. The monitoring system tracks input tokens, output tokens, and cumulative costs per agent session, per task type, and across time periods. This data enables budget enforcement, cost optimization decisions, and chargeback accounting for multi-team AI deployments.

Steps

Begin by creating a TokenTracker class that captures token counts from each API response. In the agent's model call wrapper, extract response.usage.prompt_tokens and response.usage.completion_tokens. Store each record with a timestamp, session ID, task type, and model name. Calculate cost using a pricing lookup table: cost = (prompt_tokens * INPUT_PRICE + completion_tokens * OUTPUT_PRICE) / 1_000_000. Write records to a SQLite database with table token_usage (id, session_id, task_type, model, prompt_tokens, completion_tokens, cost, timestamp). Set a budget warning threshold—log a warning when daily cost exceeds $5: if daily_total > 5.0: logger.warning(f"Daily budget exceeded: ${daily_total}"). Create a reporting function that queries the database: SELECT task_type, SUM(cost) as total_cost, SUM(prompt_tokens) as total_prompt, SUM(completion_tokens) as total_completion FROM token_usage WHERE date(timestamp) = date('now') GROUP BY task_type. Optionally, expose metrics via a /metrics endpoint using the Prometheus client library.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

  • Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.

  • Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.

  • Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

Run the agent through 3-5 tasks and query the token_usage table: each task should produce at least one row with non-zero prompt and completion tokens. Verify cost calculation by hand: pick one row and compute (prompt_tokens * input_price + completion_tokens * output_price) / 1_000_000, compare with the stored cost. Trigger the budget warning by setting a low threshold temporarily and running a task. Check that the reporting function returns correct aggregated totals that match raw row counts.

Common failures

Token counts missing from API responses: Verify the API call includes stream=False or that the streamed response collects final usage metadata. Cost calculation errors due to wrong pricing: Double-check model pricing in the lookup table against the provider's current pricing page. Database connection errors under concurrent agent instances: Use a connection pool or switch to SQLite WAL mode with PRAGMA journal_mode=WAL. Timestamp timezone mismatches: Normalize all timestamps to UTC with datetime.utcnow(). Large token counts overflow integer fields: Use BIGINT for token columns and DECIMAL(10,6) for cost.

  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • setup-agent-observability-opentelemetry
  • debug-ai-agent-loops-infinite-reasoning
  • implement-rate-limiting-ai-apis
← All how-to guidesCourses →