04. GPU Selection: Mid-Range

Chapter 4 of 20 · 15 min

The mid-range tier spans $500-$1500 and represents the sweet spot for serious local AI work. These GPUs handle 13B-34B models with reasonable performance.

Recommended Mid-Range GPUs

GPU VRAM Typical Price Max Models
RTX 4070 12GB $500-600 13B FP16
RTX 4070 Ti 12GB $700-800 13B FP16
RTX 4070 Super 12GB $600-700 13B FP16
RTX 4080 16GB $900-1100 13B FP16
RTX 4080 Super 16GB $1000-1200 33B INT4
RTX 3090 24GB $800-1000 33B FP16

Architectural Considerations

The RTX 4000 series (Ada Lovelace) offers 30-40% better performance per watt compared to RTX 3000 series (Ampere). However, the RTX 3090's 24GB VRAM advantage often outweighs architecture efficiency for larger models.

The RTX 4080 at 16GB can comfortably run 13B in FP16 or 33B in INT4. The RTX 3090 at 24GB runs 33B in FP16—useful for developers who need full precision for certain tasks.

Thermal and Power Requirements

GPU TDP Recommended PSU Fan Noise
RTX 4070 220W 650W Moderate
RTX 4080 320W 750W Higher
RTX 3090 350W 850W Highest

Mid-range builds need adequate cooling and power supply. The RTX 4080's dual-slot cooler handles most cases, but the RTX 3090's triple-slot design demands more chassis space.

Real Throughput Numbers

Running Mixtral 8x7B (moe架构) with standard parameters:

  • RTX 4080 16GB: 15 tokens/sec, limited by VRAM
  • RTX 3090 24GB: 18 tokens/sec, more headroom
  • RTX 4070 12GB: Requires INT8, 12 tokens/sec
EXERCISE

Calculate whether an RTX 4080 at $999 or an RTX 3090 at $899 offers better value for running Mistral 7B in FP16 precision. Consider VRAM, performance, and power efficiency.