UNIT · NVIDIA · GPU

6 GB VRAMmidReviewed May 2026

NVIDIA GeForce GTX 1660 Ti

Turing mid-tier without RT/Tensor cores. 6 GB VRAM fits 7B Q4 with short context. Bandwidth (288 GB/s) is solid for the tier — ~30-40 tok/s on 7B Q4. Same VRAM ceiling as the 1660 Super; the Ti pays for slightly more compute that doesn't help much for inference.

Released 2019·~$160 street·288 GB/s memory bandwidth

RUNLOCALAI SCORE

See full leaderboard →

247/ 1000

DD-tier

Estimated

Throughput

100/ 500

VRAM-fit

30/ 200

Ecosystem

200/ 200

Efficiency

23/ 100

Extrapolated from 288 GB/s bandwidth — 34.6 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT

Try other hardware →

Plain-English: Edge-of-fit for 7B; expect compromises.

7B chat~

Tight

14B chat✗

Doesn't fit

32B chat✗

Doesn't fit

70B chat✗

Doesn't fit

Coding agent✗

Doesn't fit

Vision (≤8B VLM)~

Tight

Long context (32K)✗

Doesn't fit

✓Comfortable — fits with headroom

~Tight — works, no slack

△Marginal — needs aggressive quant

✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 10, 2026

2.8/10

This card is for the budget operator who needs a functional local inference rig at the lowest possible entry cost and is willing to accept strict model size limits. The 6 GB VRAM fits a 7B Q4 model with a short context window (2-4K tokens), and the 288 GB/s bandwidth delivers ~30-40 tok/s on that workload — usable for chat or code completion. Larger models like 13B Q4 or 7B Q8 are out of reach; the card cannot load them at all. The lack of Tensor cores means no acceleration for CUDA-based inference engines like llama.cpp, but the card still runs them fine via FP16 compute. Pass on this card if you need to run 13B models, want longer context (8K+), or plan to experiment with larger quantizations. At ~$160 used, it is a stopgap for learning local AI, not a long-term investment.

›Why this rating

The GTX 1660 Ti offers decent inference speed for 7B Q4 models at a low price, but its 6 GB VRAM is a hard ceiling that excludes most modern workloads. It scores a 5.5 because it is functional for entry-level use but lacks headroom for growth.

BLK · OVERVIEW

Overview

Retailers we'd check:Amazon

Search-fallback links. Editorial hasn't yet curated retailer URLs for this card. Approx. $160.

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM	6 GB
Power draw	120 W
Released	2019
MSRP	$279
Backends	CUDA Vulkan

Models that fit

Open-weight models small enough to run on NVIDIA GeForce GTX 1660 Ti with usable context.

Llama 3.2 3B Instruct

3B · llama

Llama 3.2 1B Instruct

1B · llama

Gemma 4 E2B (Effective 2B)

DeepSeek R1 Distill Qwen 1.5B

1.5B · deepseek

Granite 3.0 2B Instruct

2B · granite

Frequently asked

What models can NVIDIA GeForce GTX 1660 Ti run?

With 6GB VRAM, the NVIDIA GeForce GTX 1660 Ti runs 7B models comfortably in Q4 quantization. See the model list below for tested combinations.

Does NVIDIA GeForce GTX 1660 Ti support CUDA?

Yes — NVIDIA GeForce GTX 1660 Ti is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

How much does NVIDIA GeForce GTX 1660 Ti cost?

Current street price for NVIDIA GeForce GTX 1660 Ti is around $160 (MSRP $279). Prices vary by region and supply.

Where next?

Buyer guides

Troubleshooting

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.

UNIT · NVIDIA · GPU

6 GB VRAMmidReviewed May 2026

NVIDIA GeForce GTX 1660 Ti

Released 2019·~$160 street·288 GB/s memory bandwidth

RUNLOCALAI SCORE

See full leaderboard →

247/ 1000

DD-tier

Estimated

Throughput

100/ 500

VRAM-fit

30/ 200

Ecosystem

200/ 200

Efficiency

23/ 100

Extrapolated from 288 GB/s bandwidth — 34.6 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT

Try other hardware →

Plain-English: Edge-of-fit for 7B; expect compromises.

7B chat~

Tight

14B chat✗

Doesn't fit

32B chat✗

Doesn't fit

70B chat✗

Doesn't fit

Coding agent✗

Doesn't fit

Vision (≤8B VLM)~

Tight

Long context (32K)✗

Doesn't fit

✓Comfortable — fits with headroom

~Tight — works, no slack

△Marginal — needs aggressive quant

✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 10, 2026

2.8/10

›Why this rating

BLK · OVERVIEW

Overview

Retailers we'd check:Amazon

Search-fallback links. Editorial hasn't yet curated retailer URLs for this card. Approx. $160.

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM	6 GB
Power draw	120 W
Released	2019
MSRP	$279
Backends	CUDA Vulkan

Models that fit

Open-weight models small enough to run on NVIDIA GeForce GTX 1660 Ti with usable context.

Llama 3.2 3B Instruct

3B · llama

Llama 3.2 1B Instruct

1B · llama

Gemma 4 E2B (Effective 2B)

DeepSeek R1 Distill Qwen 1.5B

1.5B · deepseek

Granite 3.0 2B Instruct

2B · granite

Frequently asked

What models can NVIDIA GeForce GTX 1660 Ti run?

With 6GB VRAM, the NVIDIA GeForce GTX 1660 Ti runs 7B models comfortably in Q4 quantization. See the model list below for tested combinations.

Does NVIDIA GeForce GTX 1660 Ti support CUDA?

Yes — NVIDIA GeForce GTX 1660 Ti is an NVIDIA card with full CUDA support, the most mature local-AI backend. llama.cpp, Ollama, vLLM, and ExLlamaV2 all run natively.

How much does NVIDIA GeForce GTX 1660 Ti cost?

Current street price for NVIDIA GeForce GTX 1660 Ti is around $160 (MSRP $279). Prices vary by region and supply.

Where next?

Buyer guides

Troubleshooting

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.