Every GPU ranked for local AI inference
One screen. Every catalog GPU sorted by tier, with estimated tok/s for the four canonical model sizes (7B, 14B, 32B, 70B at Q4_K_M). Measurements where we have them, bandwidth-derived estimates where we don't — every cell labeled so you know what you're reading. Methodology at /methodology.
| Tier | GPU | VRAM | BW (GB/s) | Price | 7B Q4 | 14B Q4 | 32B Q4 | 70B Q4 | Rating |
|---|---|---|---|---|---|---|---|---|---|
| A | Apple Mac Studio (M3 Ultra) apple · 2025 | 0 GB | 800 | $4,999 | ? | ? | ? | ? | 10.0 |
| A | MacBook Pro 16" M4 Max apple · 2024 | 0 GB | 546 | $3,999 | ? | ? | ? | ? | 10.0 |
| A | NVIDIA GeForce RTX 5090 nvidia · 2025 | 32 GB | 1792 | $2,499 | 199–279 | 105–148 | 46–64 | — | 9.6 |
| A | NVIDIA GeForce RTX 4090 nvidia · 2022 | 24 GB | 1008 | $1,899 | 112–157 | 59–83 | 26–36 | — | 9.4 |
| A | Apple Mac Studio (M4 Max) apple · 2025 | ? GB | 546 | $1,999 | ? | ? | ? | ? | 8.7 |
| B | Lenovo Legion 5 Pro Gen 7 (RTX 3080 16GB) nvidia · 2022 | 16 GB | ? | $1,499 | ? | ? | ? | ? | 9.3 |
| B | Apple Mac Mini (M4 Pro) apple · 2024 | ? GB | 273 | $1,399 | ? | ? | ? | ? | 8.9 |
| B | NVIDIA GeForce RTX 3090 Ti nvidia · 2022 | 24 GB | 1008 | $1,199 | 112–157 | 59–83 | 26–36 | — | 8.8 |
| B | NVIDIA GeForce RTX 3090 nvidia · 2020 | 24 GB | 936 | $899 | 104–146 | 55–77 | 24–34 | — | 8.5 |
| B | NVIDIA GeForce RTX 5080 nvidia · 2025 | 16 GB | 960 | $1,199 | 136● | 82● | — | — | 8.1 |
| B | NVIDIA GeForce RTX 5070 Ti nvidia · 2025 | 16 GB | 896 | $849 | 100–139 | 53–74 | — | — | 8.1 |
| B | NVIDIA GeForce RTX 4070 Ti Super nvidia · 2024 | 16 GB | 672 | $829 | 75–105 | 40–55 | — | — | 8.1 |
| B | Apple MacBook Air (M4) apple · 2025 | ? GB | 120 | $999 | ? | ? | ? | ? | 8.0 |
| B | AMD Radeon RX 7900 XTX amd · 2022 | 24 GB | 960 | $899 | 107–149 | 56–79 | 25–34 | — | 7.8 |
| B | NVIDIA GeForce RTX 4080 nvidia · 2022 | 16 GB | 717 | $1,099 | 80–112 | 42–59 | — | — | 7.8 |
| B | NVIDIA GeForce RTX 3080 12GB nvidia · 2022 | 12 GB | 912 | $449 | 101–142 | 54–75 | — | — | 7.3 |
| B | NVIDIA GeForce RTX 3080 Ti nvidia · 2021 | 12 GB | 912 | $480 | 101–142 | 54–75 | — | — | 7.3 |
| B | NVIDIA GeForce RTX 4080 Super nvidia · 2024 | 16 GB | 736 | $1,099 | 82–114 | 43–61 | — | — | 7.2 |
| C | Apple Mac Mini (M4) apple · 2024 | ? GB | 120 | $599 | ? | ? | ? | ? | 8.4 |
| C | AMD Radeon RX 7900 XT amd · 2022 | 20 GB | 800 | $729 | 89–124 | 47–66 | — | — | 8.1 |
| C | NVIDIA GeForce RTX 5060 Ti 16GB nvidia · 2025 | 16 GB | 448 | $459 | 50–70 | 26–37 | — | — | 8.1 |
| C | AMD Radeon RX 9070 XT amd · 2025 | 16 GB | 644 | $649 | 72–100 | 38–53 | — | — | 7.9 |
| C | AMD Radeon RX 9070 amd · 2025 | 16 GB | 624 | $569 | 69–97 | 37–51 | — | — | 7.9 |
| C | AMD Radeon RX 7900 GRE amd · 2024 | 16 GB | 576 | $549 | 64–90 | 34–47 | — | — | 7.9 |
| C | NVIDIA GeForce RTX 4060 Ti 16GB nvidia · 2023 | 16 GB | 288 | $449 | 32–45 | 17–24 | — | — | 7.8 |
| C | NVIDIA GeForce RTX 5070 nvidia · 2025 | 12 GB | 672 | $599 | 75–105 | 40–55 | — | — | 7.6 |
| C | AMD Radeon RX 7800 XT amd · 2023 | 16 GB | 624 | $459 | 69–97 | 37–51 | — | — | 7.6 |
| C | AMD Radeon RX 6950 XT amd · 2022 | 16 GB | 576 | $580 | 64–90 | 34–47 | — | — | 7.6 |
| C | NVIDIA GeForce RTX 4070 Super nvidia · 2024 | 12 GB | 504 | $619 | 56–78 | 30–42 | — | — | 7.6 |
| C | AMD Radeon RX 9060 XT amd · 2026 | 16 GB | 640 | $449 | 71–100 | 38–53 | — | — | 7.5 |
| C | AMD Radeon RX 6800 amd · 2020 | 16 GB | 512 | $380 | 57–80 | 30–42 | — | — | 7.3 |
| C | AMD Radeon RX 6800 XT amd · 2020 | 16 GB | 512 | $450 | 57–80 | 30–42 | — | — | 7.3 |
| C | AMD Radeon RX 6900 XT amd · 2020 | 16 GB | 512 | $500 | 57–80 | 30–42 | — | — | 7.3 |
| C | NVIDIA GeForce RTX 4070 nvidia · 2023 | 12 GB | 504 | $549 | 56–78 | 30–42 | — | — | 7.3 |
| C | NVIDIA GeForce RTX 4070 Ti nvidia · 2023 | 12 GB | 504 | $749 | 56–78 | 30–42 | — | — | 7.3 |
| C | AMD Radeon RX 9070 GRE amd · 2026 | 12 GB | 432 | $549 | 48–67 | 25–36 | — | — | 7.0 |
| C | NVIDIA GeForce RTX 2080 Ti nvidia · 2018 | 11 GB | 616 | $380 | 68–96 | 36–51 | — | — | 6.6 |
| C | NVIDIA GeForce RTX 3080 10GB nvidia · 2020 | 10 GB | 760 | $379 | 84–118 | — | — | — | 6.5 |
| C | Intel Arc A770 16GB intel · 2022 | 16 GB | 559 | $269 | 62–87 | 33–46 | — | — | 6.5 |
| C | NVIDIA GeForce RTX 3070 Ti nvidia · 2021 | 8 GB | 608 | $350 | 68–95 | — | — | — | 5.0 |
| D | AMD Radeon RX 7600 XT amd · 2024 | 16 GB | 288 | $309 | 32–45 | 17–24 | — | — | 7.9 |
| D | AMD Radeon RX 6750 XT amd · 2022 | 12 GB | 432 | $320 | 48–67 | 25–36 | — | — | 7.1 |
| D | AMD Radeon RX 7700 XT amd · 2023 | 12 GB | 432 | $379 | 48–67 | 25–36 | — | — | 7.1 |
| D | NVIDIA GeForce RTX 3060 12GB nvidia · 2021 | 12 GB | 360 | $249 | 40–56 | 21–30 | — | — | 7.0 |
| D | AMD Radeon RX 6700 XT amd · 2021 | 12 GB | 384 | $280 | 43–60 | 23–32 | — | — | 6.8 |
| D | NVIDIA GeForce GTX 1080 Ti nvidia · 2017 | 11 GB | 484 | $250 | 54–75 | 28–40 | — | — | 6.6 |
| D | NVIDIA GeForce RTX 5050 nvidia · 2025 | 8 GB | 320 | $249 | 36–50 | — | — | — | 6.4 |
| D | Intel Arc B580 intel · 2024 | 12 GB | 456 | $269 | 51–71 | 27–38 | — | — | 6.3 |
| D | Intel Arc B570 intel · 2025 | 10 GB | 380 | $219 | 42–59 | — | — | — | 5.8 |
| D | NVIDIA GeForce RTX 5060 nvidia · 2025 | 8 GB | 448 | $299 | 50–70 | — | — | — | 5.6 |
| D | NVIDIA GeForce RTX 5060 Ti 8GB nvidia · 2025 | 8 GB | 448 | $379 | 50–70 | — | — | — | 5.6 |
| D | NVIDIA GeForce RTX 4060 Ti 8GB nvidia · 2023 | 8 GB | 288 | $369 | 32–45 | — | — | — | 5.3 |
| D | NVIDIA GeForce RTX 4060 nvidia · 2023 | 8 GB | 272 | $279 | 30–42 | — | — | — | 5.3 |
| D | NVIDIA GeForce RTX 3050 nvidia · 2022 | 8 GB | 224 | $200 | 25–35 | — | — | — | 5.3 |
| D | NVIDIA GeForce RTX 2080 Super nvidia · 2019 | 8 GB | 496 | $320 | 55–77 | — | — | — | 5.1 |
| D | NVIDIA GeForce RTX 2070 nvidia · 2018 | 8 GB | 448 | $240 | 50–70 | — | — | — | 5.1 |
| D | AMD Radeon RX 6650 XT amd · 2022 | 8 GB | 280 | $230 | 31–44 | — | — | — | 5.1 |
| D | NVIDIA GeForce GTX 1070 Ti nvidia · 2017 | 8 GB | 256 | $160 | 28–40 | — | — | — | 5.1 |
| D | NVIDIA GeForce RTX 3060 Ti nvidia · 2020 | 8 GB | 448 | $280 | 50–70 | — | — | — | 5.0 |
| D | NVIDIA GeForce RTX 3070 nvidia · 2020 | 8 GB | 448 | $269 | 50–70 | — | — | — | 5.0 |
| D | NVIDIA GeForce RTX 2060 Super nvidia · 2019 | 8 GB | 448 | $220 | 50–70 | — | — | — | 4.8 |
| D | NVIDIA GeForce RTX 2070 Super nvidia · 2019 | 8 GB | 448 | $280 | 50–70 | — | — | — | 4.8 |
| D | AMD Radeon RX 6600 XT amd · 2021 | 8 GB | 256 | $200 | 28–40 | — | — | — | 4.8 |
| D | AMD Radeon RX 6600 amd · 2021 | 8 GB | 224 | $180 | 25–35 | — | — | — | 4.8 |
| D | NVIDIA GeForce GTX 1080 nvidia · 2016 | 8 GB | 320 | $180 | 36–50 | — | — | — | 4.6 |
| D | NVIDIA GeForce GTX 1070 nvidia · 2016 | 8 GB | 256 | $140 | 28–40 | — | — | — | 4.6 |
| D | AMD Radeon RX 580 8GB amd · 2017 | 8 GB | 256 | $80 | 28–40 | — | — | — | 3.8 |
| D | AMD Radeon RX 5700 XT amd · 2019 | 8 GB | 448 | $200 | 50–70 | — | — | — | 3.5 |
| D | AMD Radeon RX 5500 XT 8GB amd · 2019 | 8 GB | 224 | $110 | 25–35 | — | — | — | 3.5 |
| D | NVIDIA GeForce GTX 1660 Super nvidia · 2019 | 6 GB | 336 | $150 | 37–52 | — | — | — | 2.8 |
| D | NVIDIA GeForce RTX 2060 nvidia · 2019 | 6 GB | 336 | $180 | 37–52 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1660 Ti nvidia · 2019 | 6 GB | 288 | $160 | 32–45 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1660 nvidia · 2019 | 6 GB | 192 | $130 | 21–30 | — | — | — | 2.8 |
| D | NVIDIA GeForce GTX 1060 6GB nvidia · 2016 | 6 GB | 192 | $110 | 21–30 | — | — | — | 2.6 |
| D | AMD Radeon 880M (Strix Point iGPU) amd · 2024 | 0 GB | 102 | ? | ? | ? | ? | ? | 2.4 |
| D | AMD Radeon 780M (Phoenix iGPU) amd · 2023 | 0 GB | 89 | ? | ? | ? | ? | ? | 2.1 |
| D | NVIDIA GeForce GTX 1650 Super nvidia · 2019 | 4 GB | 192 | $140 | — | — | — | — | 1.8 |
| D | NVIDIA GeForce GTX 1650 nvidia · 2019 | 4 GB | 128 | $130 | — | — | — | — | 1.8 |
| D | AMD Radeon RX 5600 XT amd · 2020 | 6 GB | 336 | $140 | 37–52 | — | — | — | 1.7 |
| D | NVIDIA GeForce GTX 1050 Ti nvidia · 2016 | 4 GB | 112 | $90 | — | — | — | — | 1.3 |
| D | NVIDIA GeForce GTX 1060 3GB nvidia · 2016 | 3 GB | 192 | $70 | — | — | — | — | 1.1 |
| D | AMD Radeon RX 570 amd · 2017 | 4 GB | 224 | $60 | — | — | — | — | 1.0 |
| M | ASUS ROG Strix Scar 18 (RTX 5090 Mobile) nvidia · 2025 | 24 GB | ? | $3,999 | ? | ? | ? | ? | 9.6 |
| M | Razer Blade 16 (2025, RTX 5090 Mobile) nvidia · 2025 | 24 GB | ? | $4,499 | ? | ? | ? | ? | 9.6 |
| M | Framework Laptop 16 (RX 7700S) amd · 2024 | 8 GB | ? | $1,699 | ? | ? | ? | ? | 8.9 |
| M | NVIDIA GeForce RTX 3080 16GB (Mobile) nvidia · 2022 | 16 GB | 512 | ? | 190● | 43● | — | — | 8.8 |
| M | NVIDIA GeForce RTX 5090 Mobile nvidia · 2025 | 24 GB | 896 | ? | 100–139 | 53–74 | 23–32 | — | 8.6 |
| M | NVIDIA GeForce RTX 4090 Mobile nvidia · 2023 | 16 GB | 576 | ? | 64–90 | 34–47 | — | — | 7.3 |
| M | NVIDIA GeForce RTX 5070 Laptop GPU nvidia · 2025 | 12 GB | 384 | ? | 43–60 | 23–32 | — | — | 7.1 |
| M | NVIDIA GeForce RTX 3050 Ti (Mobile) nvidia · 2021 | 4 GB | 192 | ? | — | — | — | — | 1.5 |
| S | AMD Instinct MI355X amd · 2025 | 288 GB | 8000 | $25,000 | 889–1244 | 471–659 | 205–287 | 95–133 | 10.0 |
| S | NVIDIA B200 nvidia · 2024 | 192 GB | 8000 | $40,000 | 889–1244 | 471–659 | 205–287 | 95–133 | 10.0 |
| S | NVIDIA GB200 NVL72 nvidia · 2024 | 13824 GB | 8000 | ? | 889–1244 | 471–659 | 205–287 | 95–133 | 10.0 |
| S | AMD Instinct MI325X amd · 2024 | 256 GB | 6000 | $20,000 | 667–933 | 353–494 | 154–215 | 71–100 | 10.0 |
| S | AMD Instinct MI300X amd · 2023 | 192 GB | 5325 | $15,000 | 592–828 | 313–439 | 137–191 | 63–89 | 10.0 |
| S | AMD Instinct MI300A (APU) amd · 2023 | 128 GB | 5300 | ? | 589–824 | 312–436 | 136–190 | 63–88 | 10.0 |
| S | NVIDIA H200 nvidia · 2024 | 141 GB | 4800 | $31,000 | 533–747 | 282–395 | 123–172 | 57–80 | 10.0 |
| S | NVIDIA H100 NVL nvidia · 2023 | 188 GB | 3938 | $60,000 | 438–613 | 232–324 | 101–141 | 47–66 | 10.0 |
| S | NVIDIA H100 SXM nvidia · 2022 | 80 GB | 3350 | $30,000 | 372–521 | 197–276 | 86–120 | 40–56 | 10.0 |
| S | NVIDIA H100 PCIe nvidia · 2022 | 80 GB | 2039 | $25,000 | 227–317 | 120–168 | 52–73 | 24–34 | 10.0 |
| S | NVIDIA RTX PRO 6000 Blackwell nvidia · 2025 | 96 GB | 1792 | $8,999 | 199–279 | 105–148 | 46–64 | 21–30 | 10.0 |
| S | NVIDIA RTX 6000 Ada Generation nvidia · 2022 | 48 GB | 960 | $6,499 | 107–149 | 56–79 | 25–34 | 11–16 | 10.0 |
| S | NVIDIA L40 nvidia · 2022 | 48 GB | 864 | $8,000 | 96–134 | 51–71 | 22–31 | 10–14 | 10.0 |
| S | NVIDIA L40S nvidia · 2023 | 48 GB | 864 | $8,500 | 96–134 | 51–71 | 22–31 | 10–14 | 10.0 |
| S | NVIDIA DGX Spark (Project Digits) nvidia · 2025 | 0 GB | ? | $3,000 | ? | ? | ? | ? | 10.0 |
| S | AMD Instinct MI210 amd · 2022 | 64 GB | 1638 | $8,500 | 182–255 | 96–135 | 42–59 | 20–27 | 9.8 |
| S | AMD Instinct MI250X amd · 2021 | 128 GB | 3277 | $13,000 | 364–510 | 193–270 | 84–118 | 39–55 | 9.7 |
| S | NVIDIA A100 80GB SXM nvidia · 2020 | 80 GB | 2039 | $17,000 | 227–317 | 120–168 | 52–73 | 24–34 | 9.7 |
| S | NVIDIA RTX A6000 (Ampere) nvidia · 2020 | 48 GB | 768 | $3,500 | 85–119 | 45–63 | 20–28 | 9–13 | 9.7 |
| S | NVIDIA A40 nvidia · 2020 | 48 GB | 696 | $5,500 | 77–108 | 41–57 | 18–25 | 8–12 | 9.7 |
| S | NVIDIA RTX 5000 Ada Generation nvidia · 2023 | 32 GB | 576 | $4,000 | 64–90 | 34–47 | 15–21 | — | 9.5 |
| S | NVIDIA B300 (Blackwell Ultra) nvidia · 2025 | 288 GB | 8000 | ? | 889–1244 | 471–659 | 205–287 | 95–133 | 9.2 |
| S | NVIDIA A100 40GB nvidia · 2020 | 40 GB | 1555 | $11,000 | 173–242 | 91–128 | 40–56 | — | 9.2 |
| S | NVIDIA L4 nvidia · 2023 | 24 GB | 300 | $2,500 | 33–47 | 18–25 | 8–11 | — | 9.0 |
| S | NVIDIA RTX A5000 nvidia · 2021 | 24 GB | 768 | $2,500 | 85–119 | 45–63 | 20–28 | — | 8.7 |
| S | NVIDIA RTX 5000 PRO Blackwell 48GB nvidia · 2026 | 48 GB | 960 | $5,499 | 107–149 | 56–79 | 25–34 | 11–16 | 8.5 |
| S | AMD Instinct MI350X amd · 2025 | 288 GB | 8000 | ? | 889–1244 | 471–659 | 205–287 | 95–133 | 8.3 |
| S | Intel Gaudi 3 intel · 2024 | 128 GB | 3700 | $18,000 | 411–576 | 218–305 | 95–133 | 44–62 | 8.2 |
| S | Framework Desktop (Ryzen AI Max+ 395) amd · 2025 | ? GB | 256 | $1,999 | ? | ? | ? | ? | 8.2 |
| S | ASUS Ascent GX10 (NVIDIA GB10) nvidia · 2025 | ? GB | 273 | $2,999 | ? | ? | ? | ? | 8.1 |
| S | GMKtec EVO-X2 (Ryzen AI Max+ 395) amd · 2025 | ? GB | 256 | $1,499 | ? | ? | ? | ? | 8.0 |
| S | Intel Gaudi 2 intel · 2022 | 96 GB | 2450 | $8,000 | 272–381 | 144–202 | 63–88 | 29–41 | 7.9 |
| S | HP ZBook Ultra G1a (Ryzen AI Max+ PRO 395) amd · 2025 | ? GB | 256 | $3,999 | ? | ? | ? | ? | 7.8 |
| S | AMD EPYC 9005 (Zen 5, Turin) amd · 2024 | ? GB | 614 | ? | ? | ? | ? | ? | 7.7 |
| S | Intel Arc Pro B60 24GB intel · 2025 | 24 GB | 456 | $599 | 51–71 | 27–38 | 12–16 | — | 7.6 |
| S | NVIDIA RTX PRO 4500 Blackwell nvidia · 2025 | 32 GB | 896 | $2,600 | 100–139 | 53–74 | 23–32 | — | 7.5 |
| S | NVIDIA H20 (96GB) nvidia · 2024 | 96 GB | 4000 | ? | 444–622 | 235–329 | 103–144 | 48–67 | 7.4 |
| S | NVIDIA RTX PRO 4000 Blackwell nvidia · 2025 | 24 GB | 672 | $1,500 | 75–105 | 40–55 | 17–24 | — | 7.3 |
| E | Apple M4 Ultra apple · 2025 | 0 GB | 1100 | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M3 Ultra apple · 2025 | 0 GB | 800 | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M4 Max apple · 2024 | 0 GB | 546 | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M4 Pro apple · 2024 | 0 GB | 273 | ? | ? | ? | ? | ? | 10.0 |
| E | Apple M1 Ultra apple · 2022 | 0 GB | 800 | ? | ? | ? | ? | ? | 9.9 |
| E | Apple M2 Ultra apple · 2023 | 0 GB | 800 | ? | ? | ? | ? | ? | 9.9 |
| E | Apple M2 Max apple · 2023 | 0 GB | 400 | ? | ? | ? | ? | ? | 9.7 |
| E | Apple M1 Max apple · 2021 | 0 GB | 400 | ? | ? | ? | ? | ? | 8.9 |
| E | Apple M3 Max apple · 2023 | 0 GB | 400 | ? | ? | ? | ? | ? | 8.5 |
| E | Qualcomm Snapdragon X Elite qualcomm · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 7.3 |
| E | Qualcomm Snapdragon X2 Elite qualcomm · 2026 | ? GB | ? | ? | ? | ? | ? | ? | 6.9 |
| E | Intel Core Ultra 300 (Panther Lake) intel · 2026 | ? GB | ? | ? | ? | ? | ? | ? | 6.8 |
| E | Qualcomm Snapdragon X Plus qualcomm · 2024 | 0 GB | ? | ? | ? | ? | ? | ? | 5.8 |
| E | Qualcomm Snapdragon 8 Elite qualcomm · 2024 | 0 GB | 90 | ? | ? | ? | ? | ? | 5.3 |
| E | Apple M4 (iPad Pro) apple · 2024 | 0 GB | 120 | ? | ? | ? | ? | ? | 5.0 |
| E | Apple A18 Pro apple · 2024 | 0 GB | 60 | ? | ? | ? | ? | ? | 5.0 |
| E | Google Tensor G4 google · 2024 | 0 GB | 60 | ? | ? | ? | ? | ? | 4.8 |
| E | Apple A17 Pro apple · 2023 | 0 GB | 51 | ? | ? | ? | ? | ? | 4.7 |
| E | Qualcomm Snapdragon 8 Gen 3 qualcomm · 2023 | 0 GB | 77 | ? | ? | ? | ? | ? | 4.5 |
| E | AMD Ryzen AI 9 HX 370 (Strix Point) amd · 2024 | 0 GB | 90 | $1,599 | ? | ? | ? | ? | 3.9 |
| E | Intel Core Ultra 7 258V (Lunar Lake) intel · 2024 | 0 GB | 136 | $1,199 | ? | ? | ? | ? | 3.8 |
Estimates use the formula tok/s ≈ memory_bandwidth_GBps ÷ model_weights_GB × efficiency — the dominant constraint for autoregressive decode. The 50-70% efficiency band reflects realistic Ollama / llama.cpp / vLLM runtime overhead. See /methodology for the full derivation.
Got a rig? Run a benchmark and turn an estimate into a measured cell. Every measurement improves the table for the next reader.
Lowest $/tok-s pick for each model size
For each model tier, the catalog card with the lowest cost-per-tok/s among cards that fit. Computed from current street price ÷ estimated tok/s midpoint. A pick changing here is the live signal that prices or new hardware shifted the value frontier.
Caveat: $/tok-s is a derived estimate stacked on the bandwidth formula. For workloads where you have a measured benchmark in the table above, trust the measured number first; for unmeasured combinations, this is the ranked best-guess for buyer decisions.
Choosing a GPU for your workload
The hierarchy answers "which is fastest" — but the right card for you depends on which model size you actually want to run. The four most common operator decisions:
- 7B Q4 (autocomplete, single-model chat) — any card with ≥6 GB VRAM works. The decision shifts to price + power draw + cross-vendor preference. Top D-tier cards (Arc B580, RTX 3060) deliver useful tok/s at <$300.
- 14B Q4 (coding assistant, mid-size chat) — ≥11 GB VRAM minimum. C-tier (RTX 4070 / RX 7800 XT) is the value sweet spot at $400-600.
- 32B Q4 (full coding agent, multi-model) — ≥22 GB VRAM. B-tier 24 GB cards are the canonical buy: RTX 3090 used, RX 7900 XTX new, RTX 4090 if budget allows.
- 70B Q4 (frontier-class local) — ≥48 GB VRAM. Single-card: RTX 6000 Ada / L40S / Mac Studio M3 Ultra. Multi-card: dual 3090 / dual 4090. Workstation tier or above.
Need it more personalized? Use /choose-my-gpu for a 9-input recommender, or /will-it-run to validate a specific model + GPU combination.
Amazon search links — we may earn a small commission at no extra cost to you. How we make money.