RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /AI-Powered SaaS Products
  6. /Ch. 8
AI-Powered SaaS Products

08. Token Tracking

Chapter 8 of 24 · 15 min
KEY INSIGHT

Token tracking extends metering to include model-specific costs, enabling tiered pricing and accurate margin calculation across different AI providers. Different AI models have different costs. GPT-4 tokens cost more than GPT-3.5 tokens. Claude tokens cost differently than OpenAI tokens. Token tracking captures this granularity so pricing can be model-aware. Token counting happens at two points: input tokens (from the prompt) and output tokens (from the response). Most AI providers return token counts in the response metadata. ```python from dataclasses import dataclass from enum import Enum class ModelTier(Enum): STANDARD = "standard" PREMIUM = "premium" ENTERPRISE = "enterprise" @dataclass class ModelPricing: name: str provider: str input_cost_per_1k_kobo: int # Kobo cost per 1,000 input tokens output_cost_per_1k_kobo: int # Kobo cost per 1,000 output tokens tier: ModelTier # Current pricing (simplified—verify against actual provider pricing) MODEL_CATALOG = { "gpt-4o": ModelPricing( name="gpt-4o", provider="openai", input_cost_per_1k_kobo=350, # ₦3.50 per 1K input output_cost_per_1k_kobo=1050, # ₦10.50 per 1K output tier=ModelTier.PREMIUM ), "gpt-4o-mini": ModelPricing( name="gpt-4o-mini", provider="openai", input_cost_per_1k_kobo=22, # ₦0.22 per 1K input output_cost_per_1k_kobo=88, # ₦0.88 per 1K output tier=ModelTier.STANDARD ), "gpt-3.5-turbo": ModelPricing( name="gpt-3.5-turbo", provider="openai", input_cost_per_1k_kobo=11, # ₦0.11 per 1K input output_cost_per_1k_kobo=33, # ₦0.33 per 1K output tier=ModelTier.STANDARD ), } class TokenTracker: def calculate_cost( self, model_name: str, input_tokens: int, output_tokens: int ) -> int: """Calculate cost in kobo for a single request.""" pricing = MODEL_CATALOG.get(model_name) if not pricing: raise ValueError(f"Unknown model: {model_name}") # Calculate input cost input_cost = (input_tokens / 1000) * pricing.input_cost_per_1k_kobo output_cost = (output_tokens / 1000) * pricing.output_cost_per_1k_kobo # Round to nearest kobo return round(input_cost + output_cost) def track_request( self, db: Session, api_key: ApiKey, model_name: str, input_tokens: int, output_tokens: int, response_id: str ) -> UsageRecord: """Track a completed request and record costs.""" cost_kobo = self.calculate_cost(model_name, input_tokens, output_tokens) # Update running totals api_key.total_usage_kobo += cost_kobo api_key.total_tokens += input_tokens + output_tokens return self.meter.record_usage( tenant_id=api_key.workspace.organization_id, workspace_id=api_key.workspace_id, api_key_id=api_key.id, model_name=model_name, request_tokens=input_tokens, response_tokens=output_tokens, cost_kobo=cost_kobo, request_hash=response_id # Use provider response ID ) ``` A critical failure mode: provider API changes that affect token counting. If OpenAI changes their tokenizer or pricing, hardcoded values become stale. Build a pricing configuration table that can be updated without code deployment.

EXERCISE

Implement a rate limiting system that tracks tokens per minute (TPM) for a workspace. If a workspace exceeds their tier's TPM limit, return a 429 response. Include both synchronous checking and background quota monitoring.

← Chapter 7
Usage Metering
Chapter 9 →
API Key Management