Cloud provider Claude Opus 4.7 ($15/75/M) Claude Sonnet 4.6 ($3/15/M) GPT-5 ($5/20/M) Gemini 2.5 Pro ($1.25/10/M) Claude Haiku 4 ($1/5/M) GPT-5 mini ($0.5/2/M) DeepSeek V3 (API) ($0.27/1.1/M) Llama 3.3 70B (Together) ($0.88/0.88/M) Llama 3.3 70B (Groq) ($0.59/0.79/M) Qwen 3 32B (Together) ($0.6/0.6/M) Qwen 2.5 Coder 32B (DeepInfra) ($0.18/0.18/M)
Horizon 6 months 1 year 2 years 3 years (typical amortization) 5 years
Your local hardware — pick hardware — NVIDIA GB200 NVL72 (13824 GB) AMD Instinct MI355X (288 GB) NVIDIA B300 (Blackwell Ultra) (288 GB) AMD Instinct MI350X (288 GB) AMD Instinct MI325X (256 GB) NVIDIA B200 (192 GB) AMD Instinct MI300X (192 GB) NVIDIA H100 NVL (188 GB) NVIDIA H200 NVL (PCIe) (141 GB) — $32000 NVIDIA H200 (141 GB) Intel Gaudi 3 (128 GB) AMD Instinct MI300A (APU) (128 GB) AMD Instinct MI250X (128 GB) Intel Gaudi 2 (96 GB) NVIDIA RTX PRO 6000 Blackwell (96 GB) — $8999 NVIDIA H20 (96GB) (96 GB) NVIDIA A100 80GB SXM (80 GB) NVIDIA H100 SXM (80 GB) NVIDIA H100 PCIe (80 GB) AMD Instinct MI210 (64 GB) NVIDIA RTX 4090 48GB (China-mod) (48 GB) — $2400 NVIDIA RTX 5000 PRO Blackwell 48GB (48 GB) — $5499 NVIDIA RTX A6000 (Ampere) (48 GB) — $3500 NVIDIA RTX 6000 Ada Generation (48 GB) — $6499 NVIDIA A40 (48 GB) NVIDIA L40S (48 GB) NVIDIA L40 (48 GB) NVIDIA A100 40GB (40 GB) NVIDIA GeForce RTX 5090 (32 GB) — $2499 NVIDIA RTX 5000 Ada Generation (32 GB) NVIDIA RTX PRO 4500 Blackwell (32 GB) NVIDIA GeForce RTX 3090 Ti (24 GB) — $1199 NVIDIA GeForce RTX 5090 Mobile (24 GB) NVIDIA L4 (24 GB) ASUS ROG Strix Scar 18 (RTX 5090 Mobile) (24 GB) AMD Radeon RX 7900 XTX (24 GB) — $899 NVIDIA GeForce RTX 4090 (24 GB) — $1899 NVIDIA RTX A5000 (24 GB) Razer Blade 16 (2025, RTX 5090 Mobile) (24 GB) NVIDIA GeForce RTX 3090 (24 GB) — $899 Intel Arc Pro B60 24GB (24 GB) NVIDIA RTX PRO 4000 Blackwell (24 GB) NVIDIA RTX 2080 Ti 22GB (China-mod) (22 GB) — $350 AMD Radeon RX 7900 XT (20 GB) — $729 NVIDIA GeForce RTX 5060 Ti 16GB (16 GB) — $459 NVIDIA GeForce RTX 4080 (16 GB) — $1099 AMD Radeon RX 7800 XT (16 GB) — $459 Intel Arc A770 16GB (16 GB) — $269 Lenovo Legion 5 Pro Gen 7 (RTX 3080 16GB) (16 GB) — $1499 NVIDIA GeForce RTX 4080 Super (16 GB) — $1099 NVIDIA GeForce RTX 3080 16GB (Mobile) (16 GB) AMD Radeon RX 9070 XT (16 GB) — $649 NVIDIA GeForce RTX 4060 Ti 16GB (16 GB) — $449 NVIDIA GeForce RTX 4090 Mobile (16 GB) NVIDIA GeForce RTX 5080 (16 GB) — $1199 AMD Radeon RX 6800 XT (16 GB) — $450 AMD Radeon RX 6900 XT (16 GB) — $500 AMD Radeon RX 7900 GRE (16 GB) — $549 AMD Radeon RX 7600 XT (16 GB) — $309 AMD Radeon RX 9070 (16 GB) — $569 NVIDIA GeForce RTX 5070 Ti (16 GB) — $849 NVIDIA GeForce RTX 4070 Ti Super (16 GB) — $829 AMD Radeon RX 9060 XT (16 GB) — $449 AMD Radeon RX 6800 (16 GB) — $380 AMD Radeon RX 6950 XT (16 GB) — $580 NVIDIA GeForce RTX 4070 (12 GB) — $549 AMD Radeon RX 7700 XT (12 GB) — $379 NVIDIA GeForce RTX 3080 Ti (12 GB) — $480 Intel Arc B580 (12 GB) — $269 NVIDIA GeForce RTX 4070 Ti (12 GB) — $749 NVIDIA GeForce RTX 3060 12GB (12 GB) — $249 NVIDIA GeForce RTX 4070 Super (12 GB) — $619 NVIDIA GeForce RTX 5070 (12 GB) — $599 AMD Radeon RX 6700 XT (12 GB) — $280 NVIDIA GeForce RTX 3080 12GB (12 GB) — $449 AMD Radeon RX 6750 XT (12 GB) — $320 NVIDIA GeForce RTX 5070 Laptop GPU (12 GB) AMD Radeon RX 9070 GRE (12 GB) NVIDIA GeForce RTX 2080 Ti (11 GB) — $380 NVIDIA GeForce GTX 1080 Ti (11 GB) — $250 Intel Arc B570 (10 GB) NVIDIA GeForce RTX 3080 10GB (10 GB) — $379 NVIDIA GeForce RTX 4060 Ti 8GB (8 GB) — $369 NVIDIA GeForce RTX 5060 (8 GB) — $299 NVIDIA GeForce RTX 4060 (8 GB) — $279 Framework Laptop 16 (RX 7700S) (8 GB) NVIDIA GeForce RTX 2060 Super (8 GB) — $220 AMD Radeon RX 6600 XT (8 GB) — $200 AMD Radeon RX 6600 (8 GB) — $180 NVIDIA GeForce RTX 5060 Ti 8GB (8 GB) — $379 NVIDIA GeForce RTX 3070 (8 GB) — $269 NVIDIA GeForce GTX 1070 Ti (8 GB) — $160 NVIDIA GeForce GTX 1080 (8 GB) — $180 NVIDIA GeForce RTX 3050 (8 GB) — $200 NVIDIA GeForce RTX 2070 (8 GB) — $240 NVIDIA GeForce RTX 3070 Ti (8 GB) — $350 AMD Radeon RX 5500 XT 8GB (8 GB) — $110 NVIDIA GeForce RTX 2080 Super (8 GB) — $320 AMD Radeon RX 5700 XT (8 GB) — $200 AMD Radeon RX 580 8GB (8 GB) — $80 AMD Radeon RX 6650 XT (8 GB) — $230 NVIDIA GeForce GTX 1070 (8 GB) — $140 NVIDIA GeForce RTX 2070 Super (8 GB) — $280 NVIDIA GeForce RTX 3060 Ti (8 GB) — $280 NVIDIA GeForce RTX 5050 (8 GB) NVIDIA GeForce GTX 1660 Super (6 GB) — $150 NVIDIA GeForce RTX 2060 (6 GB) — $180 NVIDIA GeForce GTX 1660 (6 GB) — $130 NVIDIA GeForce GTX 1660 Ti (6 GB) — $160 AMD Radeon RX 5600 XT (6 GB) — $140 NVIDIA GeForce GTX 1060 6GB (6 GB) — $110 NVIDIA GeForce GTX 1650 (4 GB) — $130 NVIDIA GeForce GTX 1050 Ti (4 GB) — $90 NVIDIA GeForce RTX 3050 Ti (Mobile) (4 GB) NVIDIA GeForce GTX 1650 Super (4 GB) — $140 AMD Radeon RX 570 (4 GB) — $60 NVIDIA GeForce GTX 1060 3GB (3 GB) — $70
Local model (Q4_K_M assumed) — pick model — all-MiniLM-L6-v2 (0.0B) DeepSeek V4 Pro (1.6T MoE) (1600.0B) FLUX.1 [dev] (12.0B) Qwen 3.5 235B-A17B (MoE) (397.0B) Qwen 3 235B-A22B (235.0B) BGE Large EN v1.5 (0.3B) DeepSeek R1 (671B reasoning) (671.0B) DeepSeek V4 Flash (284B MoE) (284.0B) Kokoro 82M (0.1B) Llama 3.1 8B Instruct (8.0B) Llama 4 Scout (109.0B) Nomic Embed Text v1.5 (0.1B) Qwen 3 0.6B (0.6B) Qwen 3 30B-A3B (30.0B) Llama 3.3 70B Instruct (70.0B) Qwen 2.5 Coder 32B Instruct (32.0B) all-mpnet-base-v2 (0.1B) BGE Reranker v2 M3 (0.6B) Gemma 4 31B Dense (31.0B) Qwen 3 32B (32.0B) XTTS v2 (0.5B) Qwen 3 8B (8.0B) Whisper Base (0.1B) Whisper Small (0.2B) DeepSeek R1 Distill Llama 70B (70.0B) GLM-5.2 (753.0B) Mistral Medium 3.5 (675B MoE) (675.0B) paraphrase-multilingual-MiniLM-L12-v2 (0.1B) Whisper Tiny (0.0B) DeepSeek R1 Distill Qwen 32B (32.0B) GLM-5 (200.0B) DeepSeek V3 (671B MoE) (671.0B) FLUX.1 [schnell] (12.0B) Gemma 4 26B MoE (26.0B) Jina Embeddings v3 (0.6B) Llama 3.2 3B Instruct (3.0B) Multilingual E5 Large Instruct (0.6B) mxbai-embed-large-v1 (0.3B) Qwen 3 1.7B (1.7B) Qwen 3 14B (14.0B) Gemma 3 270M (0.3B) Mistral Small 3 24B (24.0B) Nemotron 3 Nano (30B-A3B) (30.0B) Qwen 2.5 7B Instruct (7.0B) Qwen2-VL 2B Instruct (2.0B) DeepSeek R1 Distill Qwen 7B (7.0B) Phi-4 14B (14.0B) Qwen 2.5 14B Instruct (14.0B) Gemma 3 27B (27.0B) Hermes 3 Llama 3.1 8B (8.0B) Llama 3.1 70B Instruct (70.0B) Qwen 3.6 35B-A3B (MTP) (35.0B) Kimi K2.6 (1000.0B) MiniMax-M3 (428.0B) Mistral Nemo 12B Instruct (12.0B) Phi-4 Reasoning 14B (14.0B) Qwen 2.5 32B Instruct (32.0B) DeepSeek R1 Distill Qwen 14B (14.0B) Kimi K2.7-Code (1000.0B) Llama 3.1 Nemotron 70B Instruct (70.0B) QwQ 32B Preview (32.0B) SigLIP SO400M (patch14-384) (0.4B) Gemma 4 E4B (Effective 4B) (4.0B) Llama 3.2 11B Vision Instruct (11.0B) Distil-Whisper Large v3 (0.8B) Gemma 3 12B (12.0B) Jina Reranker v2 Base Multilingual (0.3B) Nemotron 3 Super (120B-A12B) (120.0B) Nemotron 3 Ultra (550B-A55B) (550.0B) Qwen 2.5 72B Instruct (72.0B) Qwen 3 4B (4.0B) SDXL Turbo (2.6B) SmolLM2 135M Instruct (0.1B) Snowflake Arctic Embed L v2.0 (0.6B) TinyLlama 1.1B Chat v1.0 (1.1B) OLMo 2 32B (32.0B) Phi-3.5 Mini Instruct (3.8B) DeepSeek Coder V2 Lite (16B) (16.0B) Gemma 2 2B Instruct (2.0B) Hermes 3 Llama 3.1 70B (70.0B) Llama 3.1 Nemotron Ultra 253B (253.0B) Llama 4 Maverick (400.0B) Pixtral 12B (12.0B) Qwen 3.6 27B (MTP) (27.0B) Trendyol LLM Asure 12B (11.8B) Turkish Gemma 9B T1 (9.0B) Codestral 22B (22.0B) Llama 3.1 Nemotron Nano 8B (8.0B) Llama 3.2 1B Instruct (1.0B) DeepSeek V2 Lite Chat (15.7B) E5 Mistral 7B Instruct (7.1B) Florence-2 Large (0.8B) Gemma 2 9B Instruct (9.0B) Gemma 3 4B (4.0B) GTE ModernBERT Base (0.1B) Mistral 7B Instruct v0.3 (7.0B) Mistral Large 2 (123B) (123.0B) Trendyol LLM 7B Chat v0.1 (7.0B) Turkish Llama 8B Instruct v0.1 (8.0B) Gemma 4 E2B (Effective 2B) (2.0B) Dolphin 3.0 Mistral 24B (24.0B) Omni 31B Turkish Reasoning (31.0B) Cosmos Llama 3 8B Turkish (8.0B) Gemma 4 Turkish 26B (4B active) (26.0B) Kumru 2B (2.4B) Llama 3.2 90B Vision Instruct (90.0B) Mixtral 8x7B Instruct (47.0B) Stable Diffusion 3.5 Medium (2.5B) VibeThinker-3B (3.0B) Command R+ 104B (104.0B) EXAONE 3.5 7.8B Instruct (7.8B) EXAONE Deep 7.8B (7.8B) GPT-NeoX 20B (20.0B) Mistral 7B Instruct v0.1 (7.0B) Mistral 7B Instruct v0.2 (7.0B) Mistral 7B Instruct v0.2 (7.0B) mxbai-rerank-large-v2 (1.5B) NVIDIA Nemotron Nano 9B v2 Japanese (9.0B) Phi-3.5 Vision (4.2B) Piper (0.0B) Ring-2.6-1T (1000.0B) Salamandra 7B Instruct (7.0B) TinyLlama 1.1B Chat v0.3 AWQ (1.1B) TinyLlama 1.1B Chat v0.3 GPTQ (1.1B) Turkcell LLM 7B v1 (7.4B) Turkish Mistral 7B Instruct v0.2 (7.0B) Command R 35B (35.0B) Mihenk LLM v2 35B (Turkish Financial) (35.0B) Mixtral 8x22B Instruct (141.0B) Parakeet TDT 0.6B v2 (0.6B) Qwen 3.5 2B Turkish SFT (2.0B) YTU Turkish Gemma 9B v0.1 (9.2B) Gemma 3 1B (1.0B) Kanarya 2B (2.0B) WizardLM-2 8x22B (141.0B) Yi 1.5 34B (34.0B) MedGemma 27B (27.0B) SmolLM2 360M Instruct (0.4B) CodeGemma 7B (7.0B) F5-TTS (0.3B) GOT-OCR 2.0 (0.6B) Kanarya 750M (0.8B) Trendyol LLM 7B Base v0.1 (7.0B) VBART Large (Turkish Summarization) (0.4B) ColPali v1.3 (3.0B) SmolVLM Instruct (2.3B) Turkish GPT-2 Large (0.7B) Command R7B (12-2024) (8.0B) ALIA 40b instruct 2601 (40.0B) Bielik 11B v2.3 Instruct (11.0B) Bielik 11B v2.3 Instruct (11.0B) EXAONE 3.5 2.4B Instruct (2.4B) EXAONE 3.5 32B Instruct (32.0B) EXAONE 3.5 32B Instruct AWQ (32.0B) Falcon 40B Instruct (40.0B) Hermes 4 70B FP8 (70.0B) K-EXAONE 236B A23B (236.0B) LLM-jp 4 8B Instruct (8.0B) LLM-jp 4 8B Thinking (8.0B) Merlyn Education Safety 12B AWQ (12.0B) Orpheus 3B 0.1 FT (3.0B) Sarvam 105B (105.0B) Sarvam 30B (30.0B) SOLAR 10.7B v1.0 (10.7B) OLMo 2 1B Instruct (1.0B) Mistral Turkish v2 (brooqs) (7.2B) Granite 3.1 2B Instruct (2.0B) Malhajar Mistral 7B Turkish (7.2B) RefinedNeuro RN TR R2 (8.0B) RefinedNeuro RN TR R1 (8.0B) Falcon 3 3B Instruct (3.0B) Bielik 11B v2.2 Instruct GGUF (11.0B) Bielik 11B v3.0 Instruct GGUF (11.0B) Bielik 7B Instruct v0.1 GGUF (7.0B) Bielik 7B v0.1 (7.0B) Bielik-11B v3.0 Instruct FP8 Dynamic (11.0B) EXAONE 4.0.1 32B (32.0B) GPT-2 Spanish (0.1B) GPT-2 Spanish Medium (0.4B) GPT-OSS Swallow 20B RL v0.1 (20.0B) gpt2-base-french (0.1B) Japanese StableLM Instruct Gamma 7B (7.0B) llm-jp 4 32B A3B Thinking (32.0B) mGPT 13B (13.0B) Mistral 7B OpenOrca GGUF (7.0B) Mixtral 8X7B Instruct v0.1 GPTQ (46.7B) OpenThaiGPT 7B 1.0.0 Chat (7.0B) PhoGPT 4B Chat (3.7B) Pollux Judge 32B (32.0B) Qwen3 Swallow 32B RL v0.2 (32.0B) Qwen3.5 9B Thai Law Base (8.9B) Saiga Llama3 8B GGUF (8.0B) Salamandra 2B (2.3B) Salamandra 2B Instruct (2.0B) Sarvam M (24.0B) Swallow 7B (7.0B) OpenELM 3B Instruct (3.0B) Dostoevsky Doesn't Write It GPT2 (0.2B) mGPT 1.3B Uzbek (1.3B) OpenThaiGPT 1.0.0 Beta 13B Chat (13.0B) PhoGPT 4B (3.7B) Salamandra 7B (7.0B) Gervásio 8B PTPT (8.0B) mGPT 1.3B Mongol (1.3B) OpenThaiGPT 1.5 7B Instruct (7.0B) Qwen3 0.6B Hindi Instruct v1 GGUF (0.6B) Sarvam 105B FP8 (105.0B) Typhoon S ThaiLLM 8B Instruct Research Preview (8.0B) Vikhr Qwen 2.5 0.5B Instruct (0.5B) Aya 23 35B (35.0B) Aya 23 8B (8.0B) Aya Expanse 32B (32.0B) Baichuan 4 13B (13.0B) BGE M3 (0.6B) CodeQwen 1.5 7B (7.0B) Codestral Mamba 7B (7.0B) Command R+ (Aug 2024) (104.0B) DBRX Base (132.0B) DBRX Instruct (132.0B) DeepSeek Coder V2 236B (236.0B) DeepSeek Coder V3 (33.0B) DeepSeek MoE 16B Base (16.0B) DeepSeek R1 Distill Llama 8B (8.0B) DeepSeek R1 Distill Mistral 24B (24.0B) DeepSeek R1 Distill Qwen 1.5B (1.5B) DeepSeek R1 Distill Qwen 3 32B (32.0B) DeepSeek V2.5 236B (236.0B) DeepSeek V3 Lite (16B MoE) (16.0B) DeepSeek V4 (745.0B) Devstral Small 2 24B (24.0B) Dolphin 3 Llama 3.3 70B (70.0B) Dolphin 3.0 Llama 3.2 3B (3.0B) EVA Llama 3.3 70B (70.0B) EXAONE 3.5 2.4B (2.4B) EXAONE 3.5 32B (32.0B) EXAONE 3.5 8B (7.8B) Falcon 3 10B (10.0B) Falcon 3 7B Instruct (7.0B) Falcon Mamba 7B (7.0B) GLM-4 9B (9.0B) GLM-4V 9B (13.9B) GLM-5 Pro (144.0B) Granite 3 MoE (3B active) (16.0B) Granite 3.0 2B Instruct (2.0B) Granite 3.0 8B Instruct (8.0B) Granite 3.2 8B (8.0B) Granite 3.3 8B (8.0B) Hermes 3 Llama 3.2 3B (3.0B) Hermes 4 Llama 3.3 70B (70.0B) Hunyuan Large 389B MoE (389.0B) InternLM 2.5 7B Chat (7.0B) InternLM 3 8B (8.0B) InternVL 2.5 26B (26.0B) InternVL 2.5 78B (78.0B) Jamba 1.5 Large (398.0B) Jamba 1.5 Mini (52.0B) Janus-Pro 7B (7.0B) Kimi K1.5 (200.0B) Llama 3.2 11B Vision (11.0B) Llama 3.2 90B Vision (90.0B) Llama 3.3 8B Instruct (8.0B) Llama 4 405B (405.0B) Llama 4 70B (70.0B) LLaVA 1.6 Mistral 7B (7.0B) LLaVA-OneVision 7B (7.0B) Magistral 32B (32.0B) MiniCPM 3 4B (4.0B) MiniCPM-V 2.6 8B (8.0B) MiniCPM-V 3 8B (8.0B) Ministral 3B Instruct (3.0B) Ministral 8B Instruct (8.0B) Mistral Medium 3 24B (dense) (24.0B) Mistral Saba 24B (24.0B) Mistral Small 3.2 24B (24.0B) Molmo 72B (72.0B) Molmo 7B-D (8.0B) Moondream 2 (1.9B) Nemotron 3 Nano 9B (9.0B) Nemotron 3 Super 49B (49.0B) Nemotron Mini 4B Instruct (4.0B) NV-Embed v2 (7.8B) OLMo 2 13B (13.0B) OpenBioLLM Llama 3 70B (70.0B) OpenCoder 8B (8.0B) PaliGemma 2 10B (10.0B) PaliGemma 2 3B (3.0B) Phi-4 Mini 4B (3.8B) Phi-4 Multimodal (14.0B) Phi-4 Reasoning Mini 4B (3.8B) Phind CodeLlama 34B v2 (34.0B) Qwen 2-VL 7B (7.0B) Qwen 2.5 0.5B Instruct (0.5B) Qwen 2.5 1.5B Instruct (1.5B) Qwen 2.5 3B Instruct (3.0B) Qwen 2.5 Coder 1.5B (1.5B) Qwen 2.5 Coder 14B Instruct (14.0B) Qwen 2.5 Coder 3B (3.0B) Qwen 2.5 Coder 7B Instruct (7.0B) Qwen 2.5 Math 72B (72.0B) Qwen 2.5 Math 7B (7.0B) Qwen 2.5-VL 3B (3.0B) Qwen 2.5-VL 72B (72.0B) Qwen 2.5-VL 7B (7.0B) Qwen 3 7B (7.0B) Qwen 3 Coder 32B (32.0B) Qwen 3 Embedding 8B (8.0B) RWKV 7 'Goose' 1.5B (1.5B) SmolLM 2 1.7B Instruct (1.7B) SmolLM 2 360M Instruct (0.4B) SmolLM 3 3B (3.0B) Stable LM 2 12B (12.0B) StarCoder 2 15B (15.0B) StarCoder 2 3B (3.0B) StarCoder 2 7B (7.0B) Step-3 (1000.0B) Tulu 3 70B (70.0B) Tulu 3 8B (8.0B) Whisper Large v3 (1.6B) Whisper Large v3 Turbo (0.8B) Yi Coder 9B (9.0B)
§ Verdict at 500,000 tokens/day
Pick a local hardware + model to see the crossover analysis.