How to implement usage-based billing for AI products
Usage metering system, payment integration (Stripe/Paystack)
What this does
Usage-based billing charges customers proportionally to actual resource consumption rather than flat subscription fees. This system meters token usage, compute minutes, and API calls, then reports consumption to Stripe or Paystack for invoicing, enabling flexible monetization of AI services.
Steps
Create a Python middleware class that intercepts every API request to the AI service. The middleware extracts user_id, model_id, tokens_in, tokens_out, and latency_ms from the request and response. Mount it before the route handler in a FastAPI application using app.middleware("http"). Write each record to a usage_events table with columns for id, user_id, timestamp, model_id, tokens_in, tokens_out, compute_minutes, and cost_units.
At the end of each billing cycle, run an aggregation query against usage_events grouping by user_id and model_id, summing token counts and compute minutes. Apply tier pricing: free tier up to 100k tokens, standard tier from 100k to 1M tokens at a per-unit rate, and premium tier above 1M tokens at a discounted rate. Store calculated invoice line items in an invoices table with a draft status.
Create a Stripe customer for each registered user during onboarding. Use stripe.UsageRecords.create() to report metered usage quantized to the nearest unit. Call this method at the end of each billing period, passing quantity equal to the total token count and timestamp set to the period's end. Sync the invoices table status to finalized once Stripe acknowledges the report with a 200 OK response. Define subscription tiers in a configuration dictionary mapping each plan ID to price per unit and free quota. In the metering middleware, enforce a spend cap by rejecting requests when the total cost exceeds the plan's limit, returning HTTP 402 Payment Required. Send an email notification when 80% of the quota is consumed using a background task with Celery.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Confirm the local starting state. Print the active binary, package version, model name, or configuration path before changing the workflow.
Run the smallest complete path. Execute the minimum command or script that proves the guide works end to end on the local machine.
Compare against expected output. Check the final line, status code, generated artifact, or model response against the verification section before expanding the setup.
Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Verification
Send 5 test API requests with known token counts. Query usage_events to confirm 5 records exist. Trigger the aggregation function and inspect the invoices table. Run stripe.UsageRecords.list() to confirm the reported quantity matches. Expected output: usage_events contains 5 rows with correct token values; invoices contains 1 draft invoice with total_amount equal to token count multiplied by price per unit; Stripe returns 200 OK with the correct quantity; the middleware returns HTTP 402 when the spend cap is exceeded.
Common failures
- Double-reporting usage: Calling the Stripe metering API twice for the same period causes duplicate charges. Use an idempotency key based on
user_idcombined with the period start timestamp. - Clock skew between aggregation and metering: Disagreeing start and end timestamps across services assigns usage to the wrong period. Standardize on UTC and use half-open intervals.
- Currency mismatch: Stripe expects amounts in the smallest unit (cents or kobo), while internal calculations use decimal values. Multiply by 100 before sending to Stripe to avoid off-by-100 errors.
- Stale plan information: Cached plan tiers cause the wrong price to be applied. Validate plan data from the database on every request rather than loading it once at startup.
- Missing webhook confirmation: Failing to listen for
invoice.paidwebhooks leaves invoices stuck indraftstatus. Register the webhook endpoint and update status on receipt of the event.
Related guides
- Build a cost tracking dashboard for AI usage — visualizes the billed usage data for engineering and finance teams.
- Build a local AI product with Nigerian naira pricing — applies usage-based billing with naira-denominated pricing for African markets.