BLK · COMPARE · MODELS · CUSTOM
Compare any two local AI models Pick any two open-weight models. Get a 10-dimension matrix + a hardware-fit table showing where each one runs across 8 common GPU tiers — 12 GB consumer cards to 192 GB Mac Studio M3 Ultra.
PICK ANY TWO MODELS
Model A — pick a model — Baichuan 4 13B (13B) Aya Expanse 32B (32B) Command R 35B (35B) Command R+ 104B (104B) Command R+ (Aug 2024) (104B) DBRX Base (132B) DBRX Instruct (132B) DeepSeek R1 Distill Qwen 1.5B (1.5B) DeepSeek R1 Distill Qwen 7B (7B) DeepSeek R1 Distill Llama 8B (8B) DeepSeek R1 Distill Qwen 14B (14B) DeepSeek V3 Lite (16B MoE) (16B) DeepSeek Coder V2 Lite (16B) (16B) DeepSeek MoE 16B Base (16B) DeepSeek R1 Distill Mistral 24B (24B) DeepSeek R1 Distill Qwen 3 32B (32B) DeepSeek R1 Distill Qwen 32B (32B) DeepSeek Coder V3 (33B) DeepSeek R1 Distill Llama 70B (70B) DeepSeek V2.5 236B (236B) DeepSeek Coder V2 236B (236B) DeepSeek V4 Flash (284B MoE) (284B) DeepSeek R1 (671B reasoning) (671B) DeepSeek V3 (671B MoE) (671B) DeepSeek V4 (745B) DeepSeek V4 Pro (1.6T MoE) (1600B) Dolphin 3.0 Llama 3.2 3B (3B) Dolphin 3.0 Mistral 24B (24B) Dolphin 3 Llama 3.3 70B (70B) EXAONE 3.5 2.4B (2.4B) EXAONE 3.5 8B (7.8B) EXAONE 3.5 32B (32B) Falcon Mamba 7B (7B) Falcon 3 7B Instruct (7B) Falcon 3 10B (10B) Gemma 3 1B (1B) Gemma 4 E2B (Effective 2B) (2B) PaliGemma 2 3B (3B) Gemma 4 E4B (Effective 4B) (4B) Gemma 3 4B (4B) CodeGemma 7B (7B) Gemma 2 9B Instruct (9B) PaliGemma 2 10B (10B) Gemma 3 12B (12B) Gemma 4 26B MoE (26B) MedGemma 27B (27B) Gemma 3 27B (27B) Gemma 4 31B Dense (31B) GLM-4 9B (9B) GLM-4V 9B (13.9B) GLM-5 Pro (144B) Granite 3.0 2B Instruct (2B) Granite 3.2 8B (8B) Granite 3.0 8B Instruct (8B) Granite 3.3 8B (8B) Granite 3 MoE (3B active) (16B) Hermes 3 Llama 3.2 3B (3B) Hermes 3 Llama 3.1 8B (8B) Hermes 4 Llama 3.3 70B (70B) Hermes 3 Llama 3.1 70B (70B) Hunyuan Large 389B MoE (389B) InternLM 2.5 7B Chat (7B) InternLM 3 8B (8B) Janus-Pro 7B (7B) Llama 3.2 1B Instruct (1B) Llama 3.2 3B Instruct (3B) Llama 3.1 8B Instruct (8B) Llama 3.3 8B Instruct (8B) Llama 3.1 Nemotron Nano 8B (8B) Llama 3.2 11B Vision (11B) Llama 3.2 11B Vision Instruct (11B) Phind CodeLlama 34B v2 (34B) Llama 4 70B (70B) Llama 3.1 Nemotron 70B Instruct (70B) Llama 3.1 70B Instruct (70B) EVA Llama 3.3 70B (70B) Llama 3.3 70B Instruct (70B) Llama 3.2 90B Vision (90B) Llama 3.2 90B Vision Instruct (90B) Llama 4 Scout (109B) Llama 3.1 Nemotron Ultra 253B (253B) Llama 4 Maverick (400B) Llama 4 405B (405B) MiniCPM 3 4B (4B) MiniCPM-V 2.6 8B (8B) MiniCPM-V 3 8B (8B) Ministral 3B Instruct (3B) Mistral 7B Instruct v0.3 (7B) Codestral Mamba 7B (7B) Ministral 8B Instruct (8B) Mistral Nemo 12B Instruct (12B) Pixtral 12B (12B) Codestral 22B (22B) Mistral Small 3 24B (24B) Mistral Saba 24B (24B) Devstral Small 2 24B (24B) Mistral Small 3.2 24B (24B) Mistral Medium 3 24B (dense) (24B) Magistral 32B (32B) Mistral Large 2 (123B) (123B) Mistral Medium 3.5 (675B MoE) (675B) Mixtral 8x7B Instruct (47B) Mixtral 8x22B Instruct (141B) Kimi K1.5 (200B) OLMo 2 13B (13B) OpenBioLLM Llama 3 70B (70B) OpenCoder 8B (8B) SmolLM 2 360M Instruct (0.36B) BGE M3 (0.57B) BGE Reranker v2 M3 (0.57B) Whisper Large v3 Turbo (0.81B) Whisper Large v3 (1.55B) SmolLM 2 1.7B Instruct (1.7B) Moondream 2 (1.9B) SmolLM 3 3B (3B) StarCoder 2 3B (3B) Nemotron Mini 4B Instruct (4B) StarCoder 2 7B (7B) LLaVA 1.6 Mistral 7B (7B) LLaVA-OneVision 7B (7B) NV-Embed v2 (7.85B) Tulu 3 8B (8B) Molmo 7B-D (8B) Aya 23 8B (8B) Nemotron 3 Nano 9B (9B) Stable LM 2 12B (12B) StarCoder 2 15B (15B) InternVL 2.5 26B (26B) Nemotron 3 Nano (30B-A3B) (30B) OLMo 2 32B (32B) Aya 23 35B (35B) Nemotron 3 Super 49B (49B) Jamba 1.5 Mini (52B) Tulu 3 70B (70B) Molmo 72B (72B) InternVL 2.5 78B (78B) Nemotron 3 Super (120B-A12B) (120B) GLM-5 (200B) Jamba 1.5 Large (398B) Ring-2.6-1T (1000B) Kimi K2.6 (1000B) Phi-3.5 Mini Instruct (3.8B) Phi-4 Mini 4B (3.8B) Phi-4 Reasoning Mini 4B (3.8B) Phi-3.5 Vision (4.2B) Phi-4 14B (14B) Phi-4 Multimodal (14B) Phi-4 Reasoning 14B (14B) Qwen 2.5 0.5B Instruct (0.5B) Qwen 2.5 Coder 1.5B (1.5B) Qwen 2.5 1.5B Instruct (1.5B) Qwen 2.5 Coder 3B (3B) Qwen 2.5-VL 3B (3B) Qwen 2.5 3B Instruct (3B) Qwen 3 4B (4B) Qwen 3 7B (7B) Qwen 2.5 7B Instruct (7B) Qwen 2.5 Coder 7B Instruct (7B) CodeQwen 1.5 7B (7B) Qwen 2-VL 7B (7B) Qwen 2.5-VL 7B (7B) Qwen 2.5 Math 7B (7B) Qwen 3 8B (8B) Qwen 3 Embedding 8B (8B) Qwen 2.5 Coder 14B Instruct (14B) Qwen 3 14B (14B) Qwen 2.5 14B Instruct (14B) Qwen 3.6 27B (MTP) (27B) Qwen 3 30B-A3B (30B) Qwen 3 Coder 32B (32B) Qwen 2.5 32B Instruct (32B) Qwen 3 32B (32B) QwQ 32B Preview (32B) Qwen 2.5 Coder 32B Instruct (32B) Qwen 3.6 35B-A3B (MTP) (35B) Qwen 2.5 72B Instruct (72B) Qwen 3 72B (72B) Qwen 2.5-VL 72B (72B) Qwen 2.5 Math 72B (72B) Qwen 3 235B-A22B (235B) Qwen 3.5 235B-A17B (MoE) (397B) RWKV 7 'Goose' 1.5B (1.5B) Step-3 (1000B) WizardLM-2 8x22B (141B) Yi Coder 9B (9B) Yi 1.5 34B (34B)
⇄ Model B — pick a model — Baichuan 4 13B (13B) Aya Expanse 32B (32B) Command R 35B (35B) Command R+ 104B (104B) Command R+ (Aug 2024) (104B) DBRX Base (132B) DBRX Instruct (132B) DeepSeek R1 Distill Qwen 1.5B (1.5B) DeepSeek R1 Distill Qwen 7B (7B) DeepSeek R1 Distill Llama 8B (8B) DeepSeek R1 Distill Qwen 14B (14B) DeepSeek V3 Lite (16B MoE) (16B) DeepSeek Coder V2 Lite (16B) (16B) DeepSeek MoE 16B Base (16B) DeepSeek R1 Distill Mistral 24B (24B) DeepSeek R1 Distill Qwen 3 32B (32B) DeepSeek R1 Distill Qwen 32B (32B) DeepSeek Coder V3 (33B) DeepSeek R1 Distill Llama 70B (70B) DeepSeek V2.5 236B (236B) DeepSeek Coder V2 236B (236B) DeepSeek V4 Flash (284B MoE) (284B) DeepSeek R1 (671B reasoning) (671B) DeepSeek V3 (671B MoE) (671B) DeepSeek V4 (745B) DeepSeek V4 Pro (1.6T MoE) (1600B) Dolphin 3.0 Llama 3.2 3B (3B) Dolphin 3.0 Mistral 24B (24B) Dolphin 3 Llama 3.3 70B (70B) EXAONE 3.5 2.4B (2.4B) EXAONE 3.5 8B (7.8B) EXAONE 3.5 32B (32B) Falcon Mamba 7B (7B) Falcon 3 7B Instruct (7B) Falcon 3 10B (10B) Gemma 3 1B (1B) Gemma 4 E2B (Effective 2B) (2B) PaliGemma 2 3B (3B) Gemma 4 E4B (Effective 4B) (4B) Gemma 3 4B (4B) CodeGemma 7B (7B) Gemma 2 9B Instruct (9B) PaliGemma 2 10B (10B) Gemma 3 12B (12B) Gemma 4 26B MoE (26B) MedGemma 27B (27B) Gemma 3 27B (27B) Gemma 4 31B Dense (31B) GLM-4 9B (9B) GLM-4V 9B (13.9B) GLM-5 Pro (144B) Granite 3.0 2B Instruct (2B) Granite 3.2 8B (8B) Granite 3.0 8B Instruct (8B) Granite 3.3 8B (8B) Granite 3 MoE (3B active) (16B) Hermes 3 Llama 3.2 3B (3B) Hermes 3 Llama 3.1 8B (8B) Hermes 4 Llama 3.3 70B (70B) Hermes 3 Llama 3.1 70B (70B) Hunyuan Large 389B MoE (389B) InternLM 2.5 7B Chat (7B) InternLM 3 8B (8B) Janus-Pro 7B (7B) Llama 3.2 1B Instruct (1B) Llama 3.2 3B Instruct (3B) Llama 3.1 8B Instruct (8B) Llama 3.3 8B Instruct (8B) Llama 3.1 Nemotron Nano 8B (8B) Llama 3.2 11B Vision (11B) Llama 3.2 11B Vision Instruct (11B) Phind CodeLlama 34B v2 (34B) Llama 4 70B (70B) Llama 3.1 Nemotron 70B Instruct (70B) Llama 3.1 70B Instruct (70B) EVA Llama 3.3 70B (70B) Llama 3.3 70B Instruct (70B) Llama 3.2 90B Vision (90B) Llama 3.2 90B Vision Instruct (90B) Llama 4 Scout (109B) Llama 3.1 Nemotron Ultra 253B (253B) Llama 4 Maverick (400B) Llama 4 405B (405B) MiniCPM 3 4B (4B) MiniCPM-V 2.6 8B (8B) MiniCPM-V 3 8B (8B) Ministral 3B Instruct (3B) Mistral 7B Instruct v0.3 (7B) Codestral Mamba 7B (7B) Ministral 8B Instruct (8B) Mistral Nemo 12B Instruct (12B) Pixtral 12B (12B) Codestral 22B (22B) Mistral Small 3 24B (24B) Mistral Saba 24B (24B) Devstral Small 2 24B (24B) Mistral Small 3.2 24B (24B) Mistral Medium 3 24B (dense) (24B) Magistral 32B (32B) Mistral Large 2 (123B) (123B) Mistral Medium 3.5 (675B MoE) (675B) Mixtral 8x7B Instruct (47B) Mixtral 8x22B Instruct (141B) Kimi K1.5 (200B) OLMo 2 13B (13B) OpenBioLLM Llama 3 70B (70B) OpenCoder 8B (8B) SmolLM 2 360M Instruct (0.36B) BGE M3 (0.57B) BGE Reranker v2 M3 (0.57B) Whisper Large v3 Turbo (0.81B) Whisper Large v3 (1.55B) SmolLM 2 1.7B Instruct (1.7B) Moondream 2 (1.9B) SmolLM 3 3B (3B) StarCoder 2 3B (3B) Nemotron Mini 4B Instruct (4B) StarCoder 2 7B (7B) LLaVA 1.6 Mistral 7B (7B) LLaVA-OneVision 7B (7B) NV-Embed v2 (7.85B) Tulu 3 8B (8B) Molmo 7B-D (8B) Aya 23 8B (8B) Nemotron 3 Nano 9B (9B) Stable LM 2 12B (12B) StarCoder 2 15B (15B) InternVL 2.5 26B (26B) Nemotron 3 Nano (30B-A3B) (30B) OLMo 2 32B (32B) Aya 23 35B (35B) Nemotron 3 Super 49B (49B) Jamba 1.5 Mini (52B) Tulu 3 70B (70B) Molmo 72B (72B) InternVL 2.5 78B (78B) Nemotron 3 Super (120B-A12B) (120B) GLM-5 (200B) Jamba 1.5 Large (398B) Ring-2.6-1T (1000B) Kimi K2.6 (1000B) Phi-3.5 Mini Instruct (3.8B) Phi-4 Mini 4B (3.8B) Phi-4 Reasoning Mini 4B (3.8B) Phi-3.5 Vision (4.2B) Phi-4 14B (14B) Phi-4 Multimodal (14B) Phi-4 Reasoning 14B (14B) Qwen 2.5 0.5B Instruct (0.5B) Qwen 2.5 Coder 1.5B (1.5B) Qwen 2.5 1.5B Instruct (1.5B) Qwen 2.5 Coder 3B (3B) Qwen 2.5-VL 3B (3B) Qwen 2.5 3B Instruct (3B) Qwen 3 4B (4B) Qwen 3 7B (7B) Qwen 2.5 7B Instruct (7B) Qwen 2.5 Coder 7B Instruct (7B) CodeQwen 1.5 7B (7B) Qwen 2-VL 7B (7B) Qwen 2.5-VL 7B (7B) Qwen 2.5 Math 7B (7B) Qwen 3 8B (8B) Qwen 3 Embedding 8B (8B) Qwen 2.5 Coder 14B Instruct (14B) Qwen 3 14B (14B) Qwen 2.5 14B Instruct (14B) Qwen 3.6 27B (MTP) (27B) Qwen 3 30B-A3B (30B) Qwen 3 Coder 32B (32B) Qwen 2.5 32B Instruct (32B) Qwen 3 32B (32B) QwQ 32B Preview (32B) Qwen 2.5 Coder 32B Instruct (32B) Qwen 3.6 35B-A3B (MTP) (35B) Qwen 2.5 72B Instruct (72B) Qwen 3 72B (72B) Qwen 2.5-VL 72B (72B) Qwen 2.5 Math 72B (72B) Qwen 3 235B-A22B (235B) Qwen 3.5 235B-A17B (MoE) (397B) RWKV 7 'Goose' 1.5B (1.5B) Step-3 (1000B) WizardLM-2 8x22B (141B) Yi Coder 9B (9B) Yi 1.5 34B (34B)
Your hardware (optional, biases the verdict) — don't bias — NVIDIA GB200 NVL72 (13824 GB) AMD Instinct MI355X (288 GB) AMD Instinct MI325X (256 GB) AMD Instinct MI300X (192 GB) NVIDIA B200 (192 GB) NVIDIA H100 NVL (188 GB) NVIDIA H200 (141 GB) NVIDIA H200 NVL (PCIe) (141 GB) AMD Instinct MI250X (128 GB) AMD Instinct MI300A (APU) (128 GB) Intel Gaudi 3 (128 GB) Intel Gaudi 2 (96 GB) NVIDIA RTX PRO 6000 Blackwell (96 GB) NVIDIA A100 80GB SXM (80 GB) NVIDIA H100 PCIe (80 GB) NVIDIA H100 SXM (80 GB) AMD Instinct MI210 (64 GB) NVIDIA A40 (48 GB) NVIDIA L40 (48 GB) NVIDIA L40S (48 GB) NVIDIA RTX 4090 48GB (China-mod) (48 GB) NVIDIA RTX 5000 PRO Blackwell 48GB (48 GB) NVIDIA RTX 6000 Ada Generation (48 GB) NVIDIA RTX A6000 (Ampere) (48 GB) NVIDIA A100 40GB (40 GB) NVIDIA GeForce RTX 5090 (32 GB) NVIDIA RTX 5000 Ada Generation (32 GB) AMD Radeon RX 7900 XTX (24 GB) ASUS ROG Strix Scar 18 (RTX 5090 Mobile) (24 GB) NVIDIA GeForce RTX 3090 (24 GB) NVIDIA GeForce RTX 3090 Ti (24 GB) NVIDIA GeForce RTX 4090 (24 GB) NVIDIA GeForce RTX 5090 Mobile (24 GB) NVIDIA L4 (24 GB) NVIDIA RTX A5000 (24 GB) Razer Blade 16 (2025, RTX 5090 Mobile) (24 GB) NVIDIA RTX 2080 Ti 22GB (China-mod) (22 GB) AMD Radeon RX 7900 XT (20 GB) AMD Radeon RX 6800 (16 GB) AMD Radeon RX 6800 XT (16 GB)
Use case Chat / daily driver Coding agents Reasoning + math Agentic loops RAG + long context Creative writing Multimodal / vision
Verdict for reasoning Weighted: 5% (DeepSeek R1 (671B reasoning)) vs 5% (DeepSeek R1 Distill Qwen 32B)
Neither model is the clear better fit for reasoning. They split the 10 dimensions roughly evenly (1 for DeepSeek R1 (671B reasoning), 1 for DeepSeek R1 Distill Qwen 32B, 8 ties). The pick is contextual — check the per-row notes, and remember that "better for this use case" isn't "better overall."
WILL IT RUN — HARDWARE FIT
For each common hardware tier, the best-fitting quant for each model + predicted decode tok/s. ✓ comfortable, ~ tight, ✗ doesn't fit. tok/s extrapolated from bandwidth × active-footprint — measure on your stack.
Hardware tier DeepSeek R1 (671B reasoning) DeepSeek R1 Distill Qwen 32B Verdict RTX 3060 12 GB budget consumer · 360 GB/s
✗ doesn't fit ✗ doesn't fit Neither RTX 4060 Ti 16 GB consumer 16 GB · 288 GB/s
✗ doesn't fit ✗ doesn't fit Neither RTX 3090 24 GB used flagship · 936 GB/s
✗ doesn't fit ~ Q4_K_M ~22 GB · 29 tok/s est.
Only B RTX 4090 24 GB consumer flagship · 1008 GB/s
✗ doesn't fit ~ Q4_K_M ~22 GB · 31 tok/s est.
Only B RTX 5090 32 GB next-gen flagship · 1792 GB/s
✗ doesn't fit ✓ Q4_K_M ~22 GB · 56 tok/s est.
Only B RTX PRO 6000 Blackwell 96 GB workstation · 1792 GB/s
✗ doesn't fit ✓ Q8_0 ~37 GB · 32 tok/s est.
Only B Mac Studio M4 Max 128 GB apple unified · 546 GB/s
✗ doesn't fit ✓ Q8_0 ~37 GB · 11 tok/s est.
Only B Mac Studio M3 Ultra 192 GB apple unified flagship · 800 GB/s
✗ doesn't fit ✓ Q8_0 ~37 GB · 16 tok/s est.
Only B
✓ Comfortable (≥30% headroom)~ Tight (fits, <30% headroom)✗ Doesn't fit