RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Quick answers
REF
  • All buyer guides
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
← Home

Choose my GPU

Answer nine questions. We rank the GPUs in our catalog by fit for local AI on your stack — top picks, alternates, and what to avoid. Hand-written rationale per card, honest caveats, and a one-click handoff into the custom build engine.

We don’t fake tok/s numbers. Every recommendation cites a model class and a workload-realistic range. Cards over your budget appear last with explicit framing. Recommendations are rule-based scoring, not measured benchmarks.

Tell us about your build

URL updates as you change fields.

Price vs performance (budget-neutral)
11 cards · 1 skipped (no price)
0255075100$1,000$2,000$4,000$8,000Effective price (log)Performance (budget-neutral)your budgetNVIDIA RTX A6000 (Ampere) — $3,500 · AvoidNVIDIA H100 PCIe — $25,000 · AvoidNVIDIA RTX 5000 Ada Generation — $4,000 · AvoidNVIDIA L4 — $2,500 · Top pickL4NVIDIA RTX 4090 48GB (China-mod) — $2,400 · Top pickRTX 4090 48GB (China-mod)NVIDIA RTX A5000 — $2,500 · Top pickRTX A5000NVIDIA GeForce RTX 3090 — $899 · Top pickGeForce RTX 3090NVIDIA GeForce RTX 4090 — $1,899 · Top pickGeForce RTX 4090NVIDIA GeForce RTX 5090 — $2,499 · Top pickGeForce RTX 5090NVIDIA GeForce RTX 3090 Ti — $1,199 · Top pickGeForce RTX 3090 TiNVIDIA GeForce RTX 4080 — $1,099 · Top pickGeForce RTX 4080
Top pick (9)
Avoid (3)
Top picks
9 cards matching your stack tightly
Tick two cards to compare side-by-side
Top pick
nvidia24 GB~$2,500
Operator-grade

NVIDIA L4

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA L4 sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA L4 →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%65Good
  • Budget fitweight 18%85Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%95Excellent
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%85Strong
  • Perf-per-wattweight 6%95Excellent

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA L4
Top pick
nvidia48 GB~$2,400
Operator-grade

NVIDIA RTX 4090 48GB (China-mod)

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA RTX 4090 48GB (China-mod) ranks here because with 48 GB and ~1008 GB/s memory bandwidth, this clears the VRAM bar for coding agents comfortably.

Realistic model class
70B Q4 territory
Expected throughput
20-40 tok/s on 70B Q4 single-stream; 50-90 tok/s on 32B FP16.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA RTX 4090 48GB (China-mod) →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%98Excellent
  • Budget fitweight 18%85Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%25Weak
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%85Strong
  • Perf-per-wattweight 6%65Good

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Sustained ~450W — plan for a 1000W+ PSU and adequate case airflow.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA RTX 4090 48GB (China-mod)
Top pick
nvidia24 GB~$2,500
Operator-grade

NVIDIA RTX A5000

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA RTX A5000 ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA RTX A5000 →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%85Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%85Strong
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA RTX A5000
Top pick
nvidia24 GB~$899·Estimated(used-market price)
Operator-grade

NVIDIA GeForce RTX 3090

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 3090 ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
1benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
Low
1 cohort
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • Only 1 benchmark — below the 5-row threshold for cohort signal.
Help us measure NVIDIA GeForce RTX 3090 →
Measured throughput
top 1 of 1 on file · most recent first
  • ed
    llama 3.1 8b instructQ4_K_M
    105.0tok/s2026-05
Featured in stacks
  • Dual RTX 3090 workstation stack — 70B-class on $1,800 of used GPUs — Workstation · GPUs (2× 24GB used, the cheapest path to 48 GB total)
  • Quad RTX 3090 workstation stack — the prosumer 100B-class ceiling — Homelab · GPUs (4× 24GB used; the prosumer-ceiling stack)
Show 1 benchmark feeding this card▸
  • ed
    #340llama-3.1-8b-instruct · Q4_K_M
    105.0 tok/s2026-05-13
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%80Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Used-market only — fan/thermal-pad inspection required; new MSRP from launch is no longer the relevant price.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 3090
Featured in these stacks
  • Dual RTX 3090 workstation stack — 70B-class on $1,800 of used GPUs — Workstation tier · GPUs (2× 24GB used, the cheapest path to 48 GB total)
  • Quad RTX 3090 workstation stack — the prosumer 100B-class ceiling — Homelab tier · GPUs (4× 24GB used; the prosumer-ceiling stack)
Top pick
nvidia24 GB~$1,899
Operator-grade

NVIDIA GeForce RTX 4090

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 4090 ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
6benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
Low
6 cohorts
Measured throughput
top 3 of 6 on file · most recent first
  • ed
    llama 3.3 70b instructQ4_K_M
    8.0tok/s2026-05
  • ed
    llama 3.1 8b instructQ4_K_M
    150.0tok/s2026-05
  • ed
    deepseek r1 distill qwen 32bAWQ-INT4
    32.5tok/s2026-05
3 additional measurements below in the full breakdown.
Featured in stacks
  • Build a local coding-agent stack (May 2026) — Workstation · GPU (where the model runs)
  • Build an RTX 4090 AI workstation stack (May 2026) — Workstation · GPU (the hardware that defines this stack)
Show 6 benchmarks feeding this card▸
  • ed
    #344llama-3.3-70b-instruct · Q4_K_M
    8.0 tok/s2026-05-13
  • ed
    #339llama-3.1-8b-instruct · Q4_K_M
    150.0 tok/s2026-05-13
  • ed
    #336deepseek-r1-distill-qwen-32b · AWQ-INT4
    32.5 tok/s2026-05-06
  • ed
    #329qwen-3-32b · AWQ-INT4
    36.5 tok/s2026-05-03
  • ed
    #328qwen-2.5-coder-32b-instruct · AWQ-INT4
    38.2 tok/s2026-05-03
  • ed
    #327llama-3.3-70b-instruct · Q4_K_M
    14.8 tok/s2026-05-02
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%73Good
  • Budget fitweight 18%95Excellent
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%25Weak
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%65Good

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Sustained ~450W — plan for a 1000W+ PSU and adequate case airflow.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 4090
Featured in these stacks
  • Build a local coding-agent stack (May 2026) — Workstation tier · GPU (where the model runs)
  • Build an RTX 4090 AI workstation stack (May 2026) — Workstation tier · GPU (the hardware that defines this stack)
Top pick
nvidia24 GB
Operator-grade

NVIDIA GeForce RTX 5090 Mobile

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5090 Mobile ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 5090 Mobile →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%70Good
  • Budget fitweight 18%50Acceptable
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%95Excellent
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%95Excellent

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5090 Mobile
Top pick
nvidia32 GB~$2,499
Operator-grade

NVIDIA GeForce RTX 5090

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 5090 ranks here because with 32 GB and ~1792 GB/s memory bandwidth, this clears the VRAM bar for coding agents comfortably.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q5, 70B Q3 reach
Expected throughput
40-70 tok/s on 32B Q4; 15-25 tok/s on 70B Q3 with offload.
Evidence
live data · editorial + reproduced community
Editorial
1benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
Low
1 cohort
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • Only 1 benchmark — below the 5-row threshold for cohort signal.
Help us measure NVIDIA GeForce RTX 5090 →
Measured throughput
top 1 of 1 on file · most recent first
  • ed
    llama 3.1 8b instructQ4_K_M
    195.0tok/s2026-05
Pending benchmark opportunities
all →
  • Single RTX 5090 + Qwen 3 Coder 32B (vLLM, AWQ-INT4)
    critical
Show 1 benchmark feeding this card▸
  • ed
    #341llama-3.1-8b-instruct · Q4_K_M
    195.0 tok/s2026-05-13
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%93Excellent
  • Budget fitweight 18%85Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%5Poor
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%75Strong
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%40Acceptable

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Sustained ~575W — plan for a 1000W+ PSU and adequate case airflow.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 5090
Top pick
nvidia24 GB~$1,199·Estimated(used-market price)
Operator-grade

NVIDIA GeForce RTX 3090 Ti

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 3090 Ti ranks here because 24 GB hits the workable band for coding agents — fits at sensible quants without becoming the bottleneck.

Sustained 450W+ — minimum 1000W Gold PSU + good airflow. Your power tolerance is moderate (350W ceiling), which this card will exceed under load.
Realistic model class
Qwen 2.5 Coder 32B Q4 + 32K context
Expected throughput
30-60 tok/s on 32B Q4 single-stream; 80-130 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 3090 Ti →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%73Good
  • Budget fitweight 18%80Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%25Weak
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%95Excellent
  • Perf-per-wattweight 6%65Good

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •Sustained ~450W — plan for a 1000W+ PSU and adequate case airflow.
  • •Used-market only — fan/thermal-pad inspection required; new MSRP from launch is no longer the relevant price.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 3090 Ti
Top pick
nvidia16 GB~$1,099
Operator-grade

NVIDIA GeForce RTX 4080

Top pick for your setup. With your $2,500 budget on Linux for coding agents, the NVIDIA GeForce RTX 4080 sits in this tier on a balance of capability, OS compat, power, and budget fit.

Realistic model class
Qwen 2.5 Coder 14B FP16, agents OK
Expected throughput
40-70 tok/s on 7B Q4; 20-35 tok/s on 13B Q4.
Evidence
live data · editorial + reproduced community
Editorial
0benchmarks
Reproduced
0community
Stale (>18mo)
0rows
Cohort confidence
—
none
Needs measurement
This recommendation is rule-based, not evidence-backed yet.
  • No benchmarks on file for this hardware.
Help us measure NVIDIA GeForce RTX 4080 →
How we scored this card▸

Each dimension is a 0-100 score. The card's position in the ranking is the weighted sum — but we surface tiers, not raw numbers. Bars are sorted by weight (most-influential first).

  • VRAM × workloadweight 22%33Weak
  • Budget fitweight 18%80Strong
  • OS compatibilityweight 16%100Excellent
  • Skill matchweight 10%95Excellent
  • Power headroomweight 8%80Strong
  • Multi-GPU pathweight 8%80Strong
  • Thermal / noiseweight 6%95Excellent
  • Gaming alignmentweight 6%90Excellent
  • Perf-per-wattweight 6%85Strong

Tier mapping: top ≥ 75 composite · alternate 60-74 · acceptable 40-59 · avoid < 40 or over-budget / incompatible.

Caveats
  • •16 GB is below the comfortable VRAM minimum for coding agents — expect quant downgrades or very tight context windows.
Try in custom builder →See model-fit tableRecommended runtime: ollama
·Estimated(rule-based scoring)Help us measure this — submit a benchmark for NVIDIA GeForce RTX 4080
Why we ruled these out
Over-budget or fundamentally incompatible — listed for the upgrade-path conversation
  • NVIDIA RTX A6000 (Ampere) — Out of budget for this query.
    ~$3,500
  • NVIDIA H100 PCIe — Out of budget for this query.
    ~$25,000
  • NVIDIA RTX 5000 Ada Generation — Out of budget for this query.
    ~$4,000

Where to go from here

Live GPU price tracker →

Multi-store, multi-region prices for every card here. US/EU/UK/CA/AU — see what these cards actually cost in your region before you buy.

Stack Builder →

One step further: this card + runtime + 1-3 models + cost rollup + ready-to-paste install script. Eight inputs → full rig.

Custom build engine →

Once you’ve picked a card, model the full build (CPU, RAM, runtime) for which models fit comfortably.

GPU buying guide 2026 →

The long-form essay version: VRAM tiers, MoE math, NVLink truth, used-market price discipline.

Hardware combinations →

Curated multi-GPU and Apple-cluster setups with effective-VRAM math you can trust.

Scoring methodology →

How the trust layer behind these recommendations actually works — every dimension, every formula, the honest limits.

Cohort coverage report →

Where the intelligence graph has signal vs which model × hardware × quant cohorts are still underpowered.

Reproduce a benchmark →

Help tip a cohort across the 5-row threshold for outlier detection — the most operator-impactful contribution.