RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
BLK · HARDWARE INDEX

Hardware

154 units indexed. GPUs, SoCs, laptops, and edge hardware ranked for local LLM inference.

ALSO →Answer 9 questions to find your best GPU·Check what a specific card can run
NEW · GPU.HIERARCHY
Every GPU ranked for local AI in one screen →

Sortable tier list with estimated tok/s for 7B / 14B / 32B / 70B at Q4_K_M. Measured benchmarks where we have them, bandwidth-derived estimates where we don't — every cell labeled.

→
DESKTOP

Pre-built desktops

8 units
Apple Mac Studio (M4 Max)
64 GB

The accessible Mac Studio tier, launched alongside the M3 Ultra. M4 Max with 36/48/64/96GB unified memory at 546 GB/s — about 2x the M4 Pro's bandwidth. The…

apple·enthusiast·Metal
Apple Mac Studio (M3 Ultra)
192 GB

Top-spec Mac Studio with M3 Ultra. Up to 512GB unified memory in custom configs.

apple·enthusiast·Metal
Apple Mac Mini (M4 Pro)
48 GB

The sweet-spot local-AI desktop for most people. M4 Pro with 24/48/64GB unified memory at 273 GB/s — more than double the base M4's bandwidth. A 64GB config…

apple·high·Metal
Apple Mac Mini (M4)
16 GB

The cheapest entry into Apple unified memory and the most-recommended starter box for local AI. Base M4 with 16/24/32GB unified memory at 120 GB/s, in a tiny…

apple·mid·Metal
Framework Desktop (Ryzen AI Max+ 395)
128 GB

The enthusiast-favorite Strix Halo box: a Ryzen AI Max+ 395 system with 128GB LPDDR5X-8000 unified memory (~256 GB/s), up to ~96GB allocatable as VRAM on…

amd·workstation·ROCm
GMKtec EVO-X2 (Ryzen AI Max+ 395)
128 GB

The cheapest mainstream 128GB Strix Halo mini-PC (~$1,499 entry). Ryzen AI Max+ 395, 128GB LPDDR5X-8000 unified, Radeon 8060S iGPU, up to ~96GB allocatable as…

amd·workstation·ROCm
ASUS Ascent GX10 (NVIDIA GB10)
128 GB

The popular OEM twin of the NVIDIA DGX Spark, ~$1,000 cheaper (~$2,999). Same GB10 Grace Blackwell superchip, 128GB LPDDR5X unified, ~1 PFLOP FP4, ConnectX-7…

nvidia·workstation·CUDA
NVIDIA DGX Spark (Project Digits)
128 GB

NVIDIA's desktop AI box — Grace Blackwell GB10 with 128GB unified LPDDR5X. The closest consumer can get to running 200B-class models locally without renting…

nvidia·workstation·CUDA
GPU

Discrete GPUs

115 units
NVIDIA GeForce RTX 5090
32 GB

Blackwell flagship. 32GB GDDR7 on a 512-bit bus delivers ~1.79 TB/s memory bandwidth — the new top of consumer hardware for local LLM inference. Comfortably…

nvidia·enthusiast·CUDA
NVIDIA GeForce RTX 5090 Mobile
24 GB

Mobile Blackwell flagship. 24GB GDDR7 in a laptop is the new high-water mark.

nvidia·enthusiast·CUDA
NVIDIA GeForce RTX 5080
16 GB

Second-tier Blackwell. 16GB GDDR7, ~960 GB/s bandwidth. Fastest 16GB consumer card on the market.

nvidia·enthusiast·CUDA
NVIDIA GeForce RTX 4090 Mobile
16 GB

Mobile Ada flagship. 16GB VRAM in a laptop. Premium gaming and AI laptop default.

nvidia·enthusiast·CUDA
AMD Radeon RX 7900 XTX
24 GB

AMD's 24GB challenger to the 4090. ROCm Linux now solid for llama.cpp and vLLM. Best price-per-VRAM-GB on the new market.

amd·enthusiast·ROCm
NVIDIA GeForce RTX 4090
24 GB

The community-default high-end local-AI card from 2022 to 2025. 24GB GDDR6X at ~1 TB/s makes 70B Q4 comfortably loadable.

nvidia·enthusiast·CUDA
NVIDIA GeForce RTX 3090 Ti
24 GB

Highest-tier Ampere consumer card. Used market gold for AI: 24GB at sub-$1200 in 2026.

nvidia·enthusiast·CUDA
AMD Radeon RX 7900 XT
20 GB

20GB RDNA 3. Cheaper alternative to XTX.

amd·enthusiast·ROCm
AMD Radeon RX 6950 XT
16 GB

Refreshed 6900 XT with faster GDDR6 (576 GB/s). 16 GB VRAM, slightly more compute. ROCm officially supported. ~110-145 tok/s on 7B Q4. The bandwidth bump…

amd·enthusiast·ROCm
NVIDIA GeForce RTX 3080 Ti
12 GB

Ampere flagship-minus-one. 12 GB GDDR6X at 912 GB/s — closer to the 3090 in raw bandwidth than to the 3080. Fits 13B Q4 with full context, 32B Q4 with offload.…

nvidia·enthusiast·CUDA
NVIDIA GeForce RTX 3090
24 GB

The original 24GB CUDA value pick. Used market still strong in 2026 — many AI hobbyists run dual 3090 setups for 70B inference.

nvidia·enthusiast·CUDA
AMD Radeon RX 6900 XT
16 GB

RDNA 2 flagship. 16 GB VRAM at 512 GB/s, more compute than the 6800 XT. ROCm officially supported. ~95-130 tok/s on 7B Q4. The peak of RDNA 2 consumer AMD;…

amd·enthusiast·ROCm
AMD Radeon RX 6800 XT
16 GB

RDNA 2 enthusiast. 16 GB VRAM, 512 GB/s bandwidth, more compute units than the base 6800. ROCm officially supported. ~85-110 tok/s on 7B Q4, 35-50 tok/s on 13B…

amd·enthusiast·ROCm
NVIDIA GeForce RTX 2080 Ti
11 GB

Turing flagship. 11 GB GDDR6 at 616 GB/s — fits 13B Q4 comfortably, 7B Q4 at ~110-140 tok/s. Used $360-420 in 2026 makes it the 'enthusiast on a budget' floor;…

nvidia·enthusiast·CUDA
AMD Radeon RX 9070 GRE
12 GB

The 12GB member of AMD's RDNA4 desktop line, now a global SKU ($549) after a year as a China-only 'Golden Rabbit Edition'. Navi 48, 48 CUs, 192-bit, 432 GB/s,…

amd·high·ROCm
AMD Radeon RX 9070 XT
16 GB

RDNA 4 flagship. 16GB at $599 — best AMD value for local AI in 2026.

amd·high·ROCm
NVIDIA GeForce RTX 5070 Ti
16 GB

16GB Blackwell at the upper-mid price tier. Strong 14B–32B model performance.

nvidia·high·CUDA
AMD Radeon RX 9070
16 GB

16GB RDNA 4 at sub-$600. ROCm + Vulkan supported.

amd·high·ROCm
NVIDIA GeForce RTX 5070 Laptop GPU
12 GB

The volume mainstream RTX 50-series gaming-laptop GPU. Originally 8GB, a 12GB variant launched April 2026 to relieve VRAM pressure. GB206, 4,608 CUDA cores,…

nvidia·high·CUDA
AMD Radeon RX 7900 GRE
16 GB

RDNA 3 'Golden Rabbit Edition' — 16 GB at 576 GB/s, between the 6800 XT and 7900 XT. ROCm officially supported. The current AMD value choice in the $500-600…

amd·high·ROCm
NVIDIA GeForce RTX 4070 Ti Super
16 GB

16GB upgrade of the 4070 Ti. Solid mid-high pick for local AI.

nvidia·high·CUDA
NVIDIA GeForce RTX 4080 Super
16 GB

Refreshed 4080 with 16GB GDDR6X. Slightly behind 5080 but well-supported.

nvidia·high·CUDA
AMD Radeon RX 7800 XT
16 GB

16GB RDNA 3 mid-range.

amd·high·ROCm
NVIDIA GeForce RTX 4070 Ti
12 GB

12GB Ada — fits 7B–14B Q4 with usable context.

nvidia·high·CUDA
NVIDIA GeForce RTX 3080 16GB (Mobile)
16 GB

Laptop variant of Ampere. 16GB VRAM in a portable form factor was rare and remains a sleeper pick on the used market.

nvidia·high·CUDA
NVIDIA GeForce RTX 4080
16 GB

Original 4080. 16GB GDDR6X. Still capable for 14B–32B Q4 work.

nvidia·high·CUDA
AMD Radeon RX 6750 XT
12 GB

Refreshed 6700 XT with faster GDDR6 (432 GB/s). 12 GB VRAM fits 13B Q4 comfortably. ROCm officially supported. ~55-75 tok/s on 7B Q4. Strong AMD value pick at…

amd·high·ROCm
NVIDIA GeForce RTX 3080 12GB
12 GB

Mid-life 12GB refresh of the 3080. Decent 7B–14B card on the used market.

nvidia·high·CUDA
AMD Radeon RX 6700 XT
12 GB

12 GB RDNA 2 at the $280 used tier. ROCm officially supported. The VRAM headroom matters — fits 13B Q4 comfortably. ~50-65 tok/s on 7B Q4. Strong AMD value…

amd·high·ROCm
NVIDIA GeForce RTX 3070 Ti
8 GB

Ampere Ti with GDDR6X at 608 GB/s. 8 GB VRAM is the ceiling — same as base 3070, no improvement on the variable that matters. Runs 7B Q4 at ~110-145 tok/s with…

nvidia·high·CUDA
AMD Radeon RX 6800
16 GB

16 GB RDNA 2 — the AMD answer to the 4070 12 GB question. Comfortable for any 7B/13B model + 32B Q4 with offload. ROCm officially supported. ~70-90 tok/s on 7B…

amd·high·ROCm
NVIDIA GeForce RTX 3080 10GB
10 GB

Original 10GB 3080. Tight on VRAM for AI but still capable for 7B work.

nvidia·high·CUDA
NVIDIA GeForce RTX 3060 Ti
8 GB

Ampere mid-high with 8 GB at 448 GB/s. Comfortable for 7B Q4 (~80-100 tok/s) and 13B Q4 with light offload. The 'middle' of Ampere — better bandwidth than the…

nvidia·high·CUDA
NVIDIA GeForce RTX 2080 Super
8 GB

Turing 'almost-flagship'. 8 GB VRAM is the ceiling — same as base 2080 — but more bandwidth (496 GB/s) and Tensor compute. Runs 7B Q4 at ~80-105 tok/s with…

nvidia·high·CUDA
NVIDIA GeForce RTX 2070 Super
8 GB

The Turing refresh that made 8 GB Turing genuinely fast. ~70-90 tok/s on 7B Q4 with ExLlamaV2. Same 8 GB ceiling as the base 2070 but more compute. Strong…

nvidia·high·CUDA
AMD Radeon RX 5700 XT
8 GB

RDNA 1 flagship. ROCm support was always experimental and is effectively defunct in 2026. Vulkan via llama.cpp is the only operator-grade path; performance is…

amd·high
NVIDIA GeForce RTX 2070
8 GB

Turing high-tier. 8 GB VRAM, similar bandwidth to the 2060 Super, slightly more compute. Runs 7B Q4 at ~65-80 tok/s, 13B Q4 with light offload. A solid…

nvidia·high·CUDA
NVIDIA GeForce GTX 1080 Ti
11 GB

Pascal halo card. 11 GB GDDR5X at 484 GB/s — outperforms many newer mid-range cards on raw bandwidth. Runs 7B Q4 at ~50-65 tok/s, 13B Q4 fits comfortably at…

nvidia·high·CUDA
AMD Radeon RX 9060 XT
16 GB

AMD's RDNA 4 mainstream card. 16GB VRAM, ROCm + Vulkan support, $449 MSRP. Targets the same $400-500 price segment as NVIDIA's RTX 5060 Ti but ships 16GB by…

amd·mid·ROCm
NVIDIA GeForce RTX 5060 Ti 16GB
16 GB

The 16GB sub-$500 sweet spot. Best value for entering local AI seriously.

nvidia·mid·CUDA
NVIDIA GeForce RTX 5070
12 GB

Mid-range Blackwell with 12GB. 7B-14B Q4 territory.

nvidia·mid·CUDA
Intel Arc B570
10 GB

10GB Battlemage at sub-$220. Entry budget compute.

intel·mid
NVIDIA GeForce RTX 5060 Ti 8GB
8 GB

8GB Blackwell. Capable of 7B Q4 only — go 16GB SKU instead for AI work.

nvidia·mid·CUDA
AMD Radeon RX 7600 XT
16 GB

Sub-$330 16GB AMD. Memory-bandwidth-limited but great VRAM-per-dollar.

amd·mid·ROCm
NVIDIA GeForce RTX 4070 Super
12 GB

Refreshed 4070. Strong mid-range value for 12GB-tier local AI.

nvidia·mid·CUDA
Intel Arc B580
12 GB

Battlemage architecture. 12GB at $250 — the budget compute card. IPEX-LLM and Vulkan are usable paths for AI.

intel·mid
NVIDIA RTX 2080 Ti 22GB (China-mod)
22 GB

Chinese third-party modification of the stock RTX 2080 Ti, replacing the 11 GB GDDR6 with 22 GB. The TU102 chip, 352-bit memory bus, and 616 GB/s bandwidth are…

nvidia·mid·CUDA
NVIDIA GeForce RTX 4060 Ti 16GB
16 GB

The poster child of 'cheap 16GB CUDA card'. Memory bandwidth is mediocre but 16GB at $400-something opens up 14B Q4.

nvidia·mid·CUDA
NVIDIA GeForce RTX 4070
12 GB

Original 4070. 12GB Ada. Now eclipsed by 4070 Super at the same price.

nvidia·mid·CUDA
AMD Radeon RX 7700 XT
12 GB

12GB RDNA 3.

amd·mid·ROCm
NVIDIA GeForce RTX 4060 Ti 8GB
8 GB

8GB version — go 16GB SKU for AI work.

nvidia·mid·CUDA
Intel Arc A770 16GB
16 GB

Alchemist 16GB. Cheapest path to that VRAM tier. Vulkan llama.cpp is the most-tested route.

intel·mid
AMD Radeon RX 6650 XT
8 GB

Refreshed 6600 XT with slightly faster GDDR6 (280 GB/s). 8 GB VRAM ceiling unchanged. ~40-55 tok/s on 7B Q4 with ROCm. Same buyer-decision shape as the 6600 XT…

amd·mid·ROCm
NVIDIA GeForce RTX 3060 12GB
12 GB

The community pick for 'cheapest CUDA card with serious VRAM'. The value floor for local AI in 2026.

nvidia·mid·CUDA
AMD Radeon RX 6600
8 GB

Entry RDNA 2. 8 GB VRAM, lower bandwidth (224 GB/s) — the bottleneck on AI. ROCm officially supported on Linux. ~30-45 tok/s on 7B Q4. Reasonable budget AMD…

amd·mid·ROCm
AMD Radeon RX 6600 XT
8 GB

RDNA 2 mid-tier. 8 GB GDDR6 at 256 GB/s. ROCm officially supported on Linux. ~35-50 tok/s on 7B Q4. Bandwidth-bottlenecked vs the 6700 XT/6750 XT siblings.…

amd·mid·ROCm
NVIDIA GeForce RTX 3070
8 GB

8GB Ampere. Fits 7B Q4 only.

nvidia·mid·CUDA
AMD Radeon RX 5600 XT
6 GB

RDNA 1 mid-tier. 6 GB VRAM is the ceiling — fits 7B Q4 only with short context. No production ROCm support; Vulkan-only. ~20-30 tok/s on 7B Q4. The 'I have one…

amd·mid
NVIDIA GeForce RTX 2060 Super
8 GB

Turing mid with the 8 GB upgrade — meaningful for AI. 7B Q4 fits comfortably with full context, 13B Q4 fits with offload. ~60-75 tok/s on 7B with ExLlamaV2.…

nvidia·mid·CUDA
NVIDIA GeForce RTX 2060
6 GB

First consumer card with Tensor cores at the ~$200 used tier. 6 GB VRAM is the bottleneck — 7B Q4 fits with limited context. FP16/INT8 Tensor compute makes…

nvidia·mid·CUDA
NVIDIA GeForce GTX 1660 Ti
6 GB

Turing mid-tier without RT/Tensor cores. 6 GB VRAM fits 7B Q4 with short context. Bandwidth (288 GB/s) is solid for the tier — ~30-40 tok/s on 7B Q4. Same VRAM…

nvidia·mid·CUDA
NVIDIA GeForce GTX 1660 Super
6 GB

Turing mid-range with GDDR6 — bandwidth jumps to 336 GB/s vs the base 1660. Same 6 GB VRAM ceiling but ~30-45 tok/s on 7B Q4 thanks to the bandwidth bump.…

nvidia·mid·CUDA
NVIDIA GeForce GTX 1660
6 GB

Turing mid-range without RT cores. 6 GB VRAM fits 7B Q4 with short context. No Tensor cores or FP16 acceleration on consumer Turing-LITE, so inference is…

nvidia·mid·CUDA
NVIDIA GeForce GTX 1070 Ti
8 GB

Pascal Ti slot between the 1070 and 1080. 8 GB GDDR5 at 256 GB/s. No FP16 acceleration on consumer Pascal — quantized inference only. Runs 7B Q4 at ~25-35…

nvidia·mid·CUDA
AMD Radeon RX 580 8GB
8 GB

AMD Polaris with 8 GB VRAM. Cheap on used market ($70-100) but Polaris was dropped from ROCm in 2022, so AMD's official AI runtimes won't work. Vulkan via…

amd·mid
NVIDIA GeForce GTX 1070
8 GB

Pascal high-tier with 8 GB VRAM. Comfortable for 7B Q4 models at ~25-35 tok/s. Bandwidth-limited like the rest of Pascal but the 8 GB headroom matters — fits…

nvidia·mid·CUDA
NVIDIA GeForce GTX 1080
8 GB

Pascal flagship for two years. 8 GB GDDR5X at 320 GB/s — better bandwidth than the 1070. Runs 7B Q4 at ~30-45 tok/s; 13B Q4 with offload but slow. The…

nvidia·mid·CUDA
NVIDIA RTX 5000 PRO Blackwell 48GB
48 GB

NVIDIA RTX 5000 PRO is NVIDIA's Blackwell-generation workstation card, slotting between the consumer RTX 5090 (32GB) and the RTX 6000 PRO Blackwell (96GB).…

nvidia·workstation·CUDA
NVIDIA B300 (Blackwell Ultra)
288 GB

The Blackwell Ultra datacenter refresh of the B200. 288GB HBM3e per GPU, ~8 TB/s, up to 1,400W; GB300 NVL72 racks reach 1.1 ExaFLOPS FP4. The current top-end…

nvidia·workstation·CUDA
AMD Instinct MI355X
288 GB

Latest CDNA 4. 288GB HBM3e — currently the highest VRAM per chip on the market.

amd·workstation·ROCm
AMD Instinct MI350X
288 GB

The air-coolable sibling of the listed MI355X — same CDNA 4 silicon, 288GB HBM3E, ~8 TB/s, at a lower (~1,000W-class) power profile. The variant most…

amd·workstation·ROCm
NVIDIA RTX PRO 6000 Blackwell
96 GB

Pro Blackwell — 96GB GDDR7 ECC. The single-card answer to 70B and 100B+ local inference.

nvidia·workstation·CUDA
NVIDIA RTX PRO 4500 Blackwell
32 GB

Mid-tier Blackwell workstation card: 32GB GDDR7, 200W, explicitly pitched for desktop LLM inference and generative AI. Fills the single-card 32GB…

nvidia·workstation·CUDA
Intel Arc Pro B60 24GB
24 GB

Intel's workstation card explicitly marketed for low-cost local LLM inference. 24GB GDDR6, 456 GB/s, ~197 TOPS, ~$599. Board partners ship 48GB dual-GPU…

intel·workstation
NVIDIA RTX PRO 4000 Blackwell
24 GB

Single-slot 140W Blackwell workstation card with 24GB GDDR7. The low-power, compact entry to the RTX PRO Blackwell line — fits small workstations and dense…

nvidia·workstation·CUDA
NVIDIA GB200 NVL72
13824 GB

72-GPU Blackwell rack with 36 Grace CPUs. Hyperscale-only — relevant context here for understanding 'what frontier training runs on'.

nvidia·workstation·CUDA
AMD Instinct MI325X
256 GB

256GB HBM3e — direct competitor to NVIDIA H200 with more memory.

amd·workstation·ROCm
NVIDIA B200
192 GB

Datacenter Blackwell. 192GB HBM3e per chip, ~8 TB/s bandwidth. Cloud-tier — you rent these by the hour.

nvidia·workstation·CUDA
NVIDIA H200 NVL (PCIe)
141 GB

The PCIe-form-factor variant of the H200. Same 141 GB HBM3e, same memory subsystem (~4.8 TB/s bandwidth), in a dual-slot workstation card rather than the SXM5…

nvidia·workstation·CUDA
NVIDIA H200
141 GB

Hopper refresh — 141GB HBM3e at ~4.8 TB/s. Datacenter-class; rentable on RunPod, Lambda, etc.

nvidia·workstation·CUDA
Intel Gaudi 3
128 GB

Intel's enterprise AI accelerator. 128GB HBM2e. Habana stack required — limited ecosystem support.

intel·workstation
NVIDIA H20 (96GB)
96 GB

The China-market Hopper SKU tuned for inference: 96GB HBM3 (more than the standard H100's 80GB), 4.0 TB/s, 400W, with ~41% fewer cores than a full H100.…

nvidia·workstation·CUDA
NVIDIA RTX 4090 48GB (China-mod)
48 GB

Third-party physical modification of a stock GIGABYTE / ASUS / MSI RTX 4090. Chinese specialty shops (and GPVLab in the US) replace the 24 GB of GDDR6X with…

nvidia·workstation·CUDA
AMD Instinct MI300X
192 GB

192GB HBM3 datacenter card. Used by Microsoft, Oracle, Meta cloud deployments.

amd·workstation·ROCm
NVIDIA H100 NVL
188 GB

Dual-card H100 with 188GB combined memory. Built for LLM serving.

nvidia·workstation·CUDA
NVIDIA L40S
48 GB

Ada-gen datacenter card. 48GB GDDR6 — popular at cloud GPU rentals as a budget H100 alternative.

nvidia·workstation·CUDA
NVIDIA RTX 5000 Ada Generation
32 GB

32GB workstation Ada. Mid-tier pro card.

nvidia·workstation·CUDA
NVIDIA L4
24 GB

Inference-focused Ada datacenter card. Low-power 24GB suitable for 7B-14B serving.

nvidia·workstation·CUDA
Intel Gaudi 2
96 GB

Previous-gen Habana accelerator. 96GB HBM2e.

intel·workstation
NVIDIA H100 SXM
80 GB

Hopper SXM5 — 80GB HBM3 at 3.35 TB/s. The original GPU that trained GPT-4. Cloud-rentable.

nvidia·workstation·CUDA
NVIDIA H100 PCIe
80 GB

PCIe Hopper. Lower power, lower bandwidth than SXM. Server-tier.

nvidia·workstation·CUDA
AMD Instinct MI210
64 GB

64GB CDNA 2. Lower-power AMD datacenter option.

amd·workstation·ROCm
NVIDIA L40
48 GB

Original Ada datacenter. Slower than L40S. 48GB GDDR6.

nvidia·workstation·CUDA
NVIDIA RTX 6000 Ada Generation
48 GB

Pro Ada — 48GB ECC. Pre-Blackwell workstation default.

nvidia·workstation·CUDA
AMD Instinct MI250X
128 GB

Previous-gen CDNA 2. 128GB HBM2e. Powered the Frontier supercomputer.

amd·workstation·ROCm
NVIDIA RTX A5000
24 GB

24GB Ampere workstation card. Tighter power envelope than RTX 3090.

nvidia·workstation·CUDA
NVIDIA A100 80GB SXM
80 GB

Ampere datacenter flagship. 80GB HBM2e at 2 TB/s. Still common at cloud providers.

nvidia·workstation·CUDA
NVIDIA A40
48 GB

Ampere workstation/datacenter hybrid. 48GB GDDR6.

nvidia·workstation·CUDA
NVIDIA RTX A6000 (Ampere)
48 GB

Ampere-gen workstation card with 48GB. Common in AI labs; used market is reasonable for 48GB at this point.

nvidia·workstation·CUDA
NVIDIA A100 40GB
40 GB

Original A100. 40GB HBM2 at 1.55 TB/s. Trained the early generation of frontier models.

nvidia·workstation·CUDA
NVIDIA GeForce RTX 5060
8 GB

Entry Blackwell. 8GB limits to 7B Q4 with limited context.

nvidia·entry·CUDA
NVIDIA GeForce RTX 5050
8 GB

The cheapest Blackwell desktop card ($249) and the entry point to the RTX 50 series. 8GB GDDR6, 128-bit, 320 GB/s, 130W. A budget CUDA option for small LLMs…

nvidia·entry·CUDA
Intel Arc 140V (Lunar Lake iGPU)

Intel Lunar Lake's Arc 140V iGPU (Xe2 Battlemage architecture). Highest iGPU bandwidth on Windows in 2026 (137 GB/s LPDDR5x-8533). ~12-18 tok/s on 7B Q4. Pairs…

intel·entry
AMD Radeon 880M (Strix Point iGPU)

AMD's 880M iGPU (Ryzen AI 300 series Strix Point). RDNA 3.5 with LPDDR5x-7500 unified memory — bandwidth jump from 780M (89 → 102 GB/s). ~8-15 tok/s on 7B Q4.…

amd·entry·ROCm
NVIDIA GeForce RTX 4060
8 GB

Entry-level Ada. 8GB limits to 7B Q4.

nvidia·entry·CUDA
AMD Radeon 780M (Phoenix iGPU)

AMD's 780M iGPU (Ryzen 7040/8040 series Phoenix). Shares system RAM via unified memory architecture; 32 GB DDR5 system gives effective 16-20 GB usable for…

amd·entry·ROCm
NVIDIA GeForce RTX 3050
8 GB

Ampere entry with full Tensor + RT cores at the $200 tier. 8 GB VRAM is the practical floor for serious 7B work. Bandwidth (224 GB/s) is the bottleneck —…

nvidia·entry·CUDA
AMD Radeon RX 5500 XT 8GB
8 GB

RDNA 1 entry. 8 GB GDDR6 at 224 GB/s. ROCm support was always experimental on RDNA 1 and is effectively defunct in 2026 — Vulkan via llama.cpp is the only…

amd·entry
NVIDIA GeForce GTX 1650 Super
4 GB

Turing entry refresh with GDDR6. 4 GB VRAM is below the practical AI floor — 1-3B Q4 only. No Tensor cores. Common in pre-built office PCs from 2019-2021 that…

nvidia·entry·CUDA
NVIDIA GeForce GTX 1650
4 GB

Turing entry without RT/Tensor cores. 4 GB VRAM keeps it at the practical floor — 1-3B Q4 only. The 'I built a budget gaming PC' audience runs into VRAM walls…

nvidia·entry·CUDA
AMD Radeon RX 570
4 GB

Cut-down RX 580 with 4 GB VRAM. Below the practical AI floor; 1-3B Q4 with offload at best. ROCm was never supported on Polaris in any production build; Vulkan…

amd·entry
NVIDIA GeForce GTX 1060 6GB
6 GB

Pascal mid-range, 6 GB VRAM. The most-installed Steam GPU for many years; high probability the 'I have a GTX 1060' audience is asking about this card. Runs 7B…

nvidia·entry·CUDA
NVIDIA GeForce GTX 1050 Ti
4 GB

Pascal-era entry GPU. 4 GB VRAM is the practical floor for any local model — fits 1-3B at Q4 with room for short context. CUDA-compatible but no FP16…

nvidia·entry·CUDA
NVIDIA GeForce GTX 1060 3GB
3 GB

Pascal mid-range cut down to 3 GB VRAM. Below the practical AI floor — even 3B Q4 models need ~2 GB plus KV cache. Operators with this card almost universally…

nvidia·entry·CUDA
NVIDIA GeForce RTX 3050 Ti (Mobile)
4 GB

Mobile-only Ampere with 4 GB VRAM at 192 GB/s. The 4 GB ceiling is the bottleneck — 1-3B Q4 only with no headroom for context. CUDA + Tensor cores work, but…

nvidia·mobile·CUDA
LAPTOP

Laptops

7 units
Razer Blade 16 (2025, RTX 5090 Mobile)
24 GB

Top-end Windows AI laptop with 24GB RTX 5090 Mobile.

nvidia·enthusiast·CUDA
ASUS ROG Strix Scar 18 (RTX 5090 Mobile)
24 GB

Desktop-replacement gaming/AI laptop with cooler thermals than ultraslims.

nvidia·enthusiast·CUDA
MacBook Pro 16" M4 Max
128 GB

16-inch M4 Max — 128GB unified at 546 GB/s. The most capable AI laptop in 2026.

apple·enthusiast·Metal
Lenovo Legion 5 Pro Gen 7 (RTX 3080 16GB)
16 GB

Ryzen 7 6800H + RTX 3080 16GB Mobile. The reference 'serious local-AI laptop' build. Look for the 16GB SKU.

nvidia·high·CUDA
Apple MacBook Air (M4)
16 GB

The cheapest portable Apple unified-memory machine and a common 'try local LLMs on a laptop' entry point. M4 with 16/24/32GB at 120 GB/s, fanless. Runs 8-14B…

apple·mid·Metal
Framework Laptop 16 (RX 7700S)
8 GB

Modular AMD laptop. Limited GPU but the platform is the appeal.

amd·mid·ROCm
HP ZBook Ultra G1a (Ryzen AI Max+ PRO 395)
128 GB

The flagship Strix Halo mobile workstation: a 14" laptop with 128GB LPDDR5X-8000 unified memory (up to ~96GB allocatable to the Radeon 8060S iGPU). The…

amd·workstation·ROCm
SOC

Apple Silicon / SoCs

11 units
Apple M3 Ultra
192 GB

M3 Ultra — up to 512GB unified in Mac Studio top spec. 819 GB/s bandwidth.

apple·enthusiast·Metal
Apple M4 Ultra
256 GB

Two-chip Ultra fusing two M4 Max dies. Up to 256GB unified memory at 1.1 TB/s. The single highest-VRAM consumer rig you can buy in a Mac Studio.

apple·enthusiast·Metal
Apple M4 Max
128 GB

M4 Max — 546 GB/s memory bandwidth, up to 128GB unified. Most capable laptop SoC for 70B+ models.

apple·enthusiast·Metal
Apple M2 Ultra
192 GB

M2 Ultra — up to 192GB at 800 GB/s. Mac Studio and Mac Pro hosting models.

apple·enthusiast·Metal
Apple M3 Max
96 GB

M3 Max — 400 GB/s bandwidth, up to 128GB.

apple·enthusiast·Metal
Apple M1 Ultra
128 GB

Original Ultra — 800 GB/s. 64–128GB unified. Still capable for 70B Q4.

apple·enthusiast·Metal
Apple M4 Pro
48 GB

Mid-tier M4 — 273 GB/s bandwidth, up to 48GB.

apple·high·Metal
Qualcomm Snapdragon X Elite
32 GB

Copilot+ PC reference SoC. 12-core Oryon CPU + Adreno GPU + Hexagon NPU at 45 TOPS INT8. The first ARM Windows laptop with serious NPU compute; runs Phi Silica…

qualcomm·high
Apple M2 Max
64 GB

M2 Max — 400 GB/s bandwidth, up to 96GB.

apple·high·Metal
Apple M1 Max
32 GB

Original M1 Max. 400 GB/s. 32–64GB unified.

apple·high·Metal
Qualcomm Snapdragon X Plus
16 GB

Lower-tier Snapdragon X. 45 TOPS NPU.

qualcomm·mid
APU

APUs

2 units
AMD Ryzen AI Max+ 395 (Strix Halo)
64 GB

AMD's first true Strix Halo chip, combining Zen 5 CPU cores, RDNA 3.5 integrated graphics, and a 50-TOPS XDNA 2 NPU on a single die. The headline feature for…

amd·workstation·ROCm
AMD Instinct MI300A (APU)
128 GB

Combined CPU + GPU APU with 128GB unified HBM3. Powers the El Capitan supercomputer.

amd·workstation·ROCm
CPU

CPUs

1 unit
AMD EPYC 9005 (Zen 5, Turin)
768 GB

AMD's Zen 5 server CPU: up to 192 cores, 12-channel DDR5-6400 (~614 GB/s per socket), supporting terabyte-scale RAM. Anchors the CPU-inference strategy —…

amd·workstation
MOBILE

Mobile SoCs

6 units
Google Tensor G4
12 GB

Pixel 9 SoC. Google's mobile chip optimized for Gemini Nano + on-device transcription / summarization. NPU TOPS not publicly disclosed by Google; treat as…

google·mobile
Apple A18 Pro
8 GB

iPhone 16 Pro SoC. Improved Neural Engine for Apple Intelligence on-device workloads. 8GB RAM as the new mobile floor enables 3B-class on-device models.

apple·mobile·Metal
Apple M4 (iPad Pro)
8 GB

iPad Pro M4 — 10-core CPU + 10-core GPU + 16-core Neural Engine (38 TOPS). 8/16GB unified memory. Best mobile-class chip for local-AI experimentation as of…

apple·mobile·Metal
Qualcomm Snapdragon 8 Elite
16 GB

Late-2024 Android flagship SoC. Oryon CPU + Hexagon NPU at ~80 TOPS INT8. 8B-class models become viable on-device with adequate quantization.

qualcomm·mobile
Qualcomm Snapdragon 8 Gen 3
12 GB

Flagship Android SoC. Hexagon NPU at 45 TOPS INT8. First mainstream phone NPU to run 7B-class models on-device via Qualcomm AI Hub + ONNX Runtime Mobile.

qualcomm·mobile
Apple A17 Pro
8 GB

iPhone 15 Pro / 15 Pro Max SoC. 6-core CPU + 6-core GPU + 16-core Neural Engine (35 TOPS INT8). First A-series with hardware ray tracing; runs MLX-Swift and…

apple·mobile·Metal
NPU

PC NPUs

4 units
Intel Core Ultra 300 (Panther Lake)
32 GB

Intel's current-gen AI-PC platform (launched Jan 2026, Intel 18A node), succeeding Lunar Lake. NPU5 delivers ~50 TOPS (clears the 40-TOPS Copilot+ bar);…

intel·pc-npu
Qualcomm Snapdragon X2 Elite
32 GB

Second-gen Snapdragon PC platform (retail H1 2026), succeeding the X Elite/X Plus. Highest laptop NPU at 80 TOPS Hexagon, up to 18 Oryon cores at 5GHz; some…

qualcomm·pc-npu
Intel Core Ultra 7 258V (Lunar Lake)
16 GB

Intel Lunar Lake 9-core. NPU 4 at 48 TOPS INT8 + Xe2 iGPU + Skymont E-cores. Copilot+ PC certified. Runs DirectML + ONNX Runtime + OpenVINO; primary…

intel·pc-npu
AMD Ryzen AI 9 HX 370 (Strix Point)
32 GB

Strix Point laptop SoC. XDNA 2 NPU at 50 TOPS INT8 + RDNA 3.5 iGPU. ROCm support on Linux unlocks llama.cpp ROCm path; on Windows, ONNX Runtime + DirectML.

amd·pc-npu·ROCm
BLK · BUY · AMAZON
Shop GPUs & AI hardware on Amazon:GPU category·RTX 4090·RTX 5090·Apple M-series·AI mini-PCs

Amazon search links — we may earn a small commission at no extra cost to you. How we make money.

Get monthly local AI changes

Monthly recap of local-AI changes. No spam, unsubscribe with one click.