07. Usage Metering

Chapter 7 of 24 · 15 min

KEY INSIGHT

Usage metering tracks resource consumption at the API call level, enabling accurate billing and preventing abuse before it impacts margins. AI API costs are measured in tokens. Every request to an AI model consumes tokens, which translates directly to costs from model providers plus your margin. Without accurate metering, billing becomes guesswork and margin erosion goes unnoticed. The metering system records each API call with sufficient granularity to reconstruct billing reports and detect anomalies. ```python from sqlalchemy import Column, String, Integer, DateTime, Text from datetime import datetime class UsageRecord(Base): __tablename__ = "usage_records" id = Column(String(36), primary_key=True) tenant_id = Column(String(36), nullable=False, index=True) workspace_id = Column(String(36), nullable=False, index=True) api_key_id = Column(String(36), nullable=False, index=True) # Request details model_name = Column(String(100), nullable=False) request_tokens = Column(Integer, nullable=False, default=0) response_tokens = Column(Integer, nullable=False, default=0) total_tokens = Column(Integer, nullable=False) # Cost tracking (in kobo for Naira) cost_kobo = Column(Integer, nullable=False) # Timestamps created_at = Column(DateTime, default=datetime.utcnow, index=True) # Request hash for deduplication request_hash = Column(String(64), nullable=False) class UsageMeter: def __init__(self, db: Session): self.db = db def record_usage( self, tenant_id: str, workspace_id: str, api_key_id: str, model_name: str, request_tokens: int, response_tokens: int, cost_kobo: int, request_hash: str ) -> UsageRecord: """Record usage for a single API call.""" # Check for duplicate (idempotency) existing = self.db.query(UsageRecord).filter_by( request_hash=request_hash ).first() if existing: return existing record = UsageRecord( id=str(uuid.uuid4()), tenant_id=tenant_id, workspace_id=workspace_id, api_key_id=api_key_id, model_name=model_name, request_tokens=request_tokens, response_tokens=response_tokens, total_tokens=request_tokens + response_tokens, cost_kobo=cost_kobo, request_hash=request_hash ) self.db.add(record) self.db.commit() return record ``` Deduplication matters. Network failures cause clients to retry requests. Without deduplication via request hashing, retry attempts get billed multiple times. The `request_hash` should be a hash of the unique request identifier sent by the client, or a hash of (timestamp + model + truncated prompt) for client-generated requests. ```python import hashlib import json def generate_request_hash( api_key_id: str, model_name: str, prompt: str, timestamp: datetime ) -> str: """Generate deterministic hash for deduplication.""" # Use truncated prompt to limit hash input size payload = { "key": api_key_id, "model": model_name, "prompt": prompt[:500], # First 500 chars "ts": timestamp.isoformat() } return hashlib.sha256(json.dumps(payload, sort_keys=True).encode()).hexdigest() ```

EXERCISE

Design a usage aggregation system that recalculates daily/weekly/monthly totals from individual records. Include aggregation logic and explain when you'd use materialized views versus on-demand calculation.