14. Cost Analytics
Cost analytics transforms raw usage data into actionable financial intelligence. Without systematic cost tracking, organizations discover budget overruns only at billing time—too late for course correction. Real-time cost visibility enables dynamic routing decisions that optimize spend without degrading quality.
The fundamental cost model requires per-token pricing from cloud providers and per-kWh pricing from local infrastructure. Cloud costs vary by model capability and context window size. Local costs vary by GPU model, utilization efficiency, and regional electricity rates. A unified cost model normalizes these variables into comparable metrics.
from dataclasses import dataclass
from decimal import Decimal
@dataclass
class CostBreakdown:
provider: str
model: str
input_tokens: int
output_tokens: int
compute_cost: Decimal
transfer_cost: Decimal = Decimal("0")
@property
def total_cost(self) -> Decimal:
return self.compute_cost + self.transfer_cost
@property
def cost_per_1k_tokens(self) -> Decimal:
total_tokens = self.input_tokens + self.output_tokens
if total_tokens == 0:
return Decimal("0")
return (self.total_cost / total_tokens) * 1000
class CostAnalyzer:
def __init__(self, pricing: dict[str, ProviderPricing],
gpu_config: GPUConfig):
self.pricing = pricing
self.gpu = gpu_config
def calculate_cloud_cost(self, provider: str,
model: str,
input_tokens: int,
output_tokens: int) -> CostBreakdown:
rates = self.pricing[provider].get_rates(model)
input_cost = Decimal(input_tokens) * rates.input_per_token
output_cost = Decimal(output_tokens) * rates.output_per_token
return CostBreakdown(
provider=provider, model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
compute_cost=input_cost + output_cost
)
def calculate_local_cost(self, model: str,
input_tokens: int,
output_tokens: int,
inference_time_ms: int) -> CostBreakdown:
gpu_power_watts = self.gpu.get_power_draw(model)
kwh_cost = Decimal(self.gpu.electricity_rate)
hours = Decimal(inference_time_ms) / 3_600_000
kwh_consumed = Decimal(gpu_power_watts) / 1000 * hours
compute_cost = kwh_consumed * kwh_cost
return CostBreakdown(
provider="local", model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
compute_cost=compute_cost
)
Cost attribution enables department-level or project-level budget tracking. Tags embedded in requests flow through to cost reports, creating accountability without requiring separate infrastructure. Monthly cost forecasting based on usage trends informs budget planning and identifies anomalies for investigation.
Build a cost dashboard comparing local versus cloud inference costs over a 30-day period. Identify the break-even point where local infrastructure becomes cost-advantageous for your usage patterns.