12. Cloud GPU Fallback
Chapter 12 of 20 · 20 min
Cloud GPU instances provide elastic capacity for workloads that exceed local hardware. Understanding when and how to use cloud resources completes a hardware strategy.
Cloud GPU Pricing
| Provider | GPU | VRAM | Price/hr | Price/hr (spot) |
|---|---|---|---|---|
| Vast.ai | RTX 3080 | 10GB | $0.20-0.35 | $0.10-0.20 |
| Vast.ai | RTX 3090 | 24GB | $0.40-0.60 | $0.20-0.35 |
| Vast.ai | A100 40GB | 40GB | $1.50-2.50 | $0.80-1.20 |
| Lambda Labs | A100 80GB | 80GB | $3.00+ | $1.50+ |
| AWS p4d | A100 40GB | 40GB | $3.67 | N/A |
Prices vary by region and demand. Vast.ai typically offers best cost efficiency for short-term needs.
Break-Even Calculation
When does buying vs. renting make sense?
Monthly GPU cost (purchase): ($1500 GPU / 36 months) + ($50 electricity) = $92/month
Equivalent cloud usage: $92 / $0.30/hr = 307 hours/month = 12.8 hrs/day
At typical usage patterns, purchasing makes sense above 8 hours/day of active inference.
Cloud Instance Selection
For 70B model fine-tuning:
- 1x A100 80GB: Required for 70B QLoRA
- 8x A100 40GB: Required for 70B full fine-tuning
- Spot instances: 40-60% savings with interruption risk
For inference only (70B):
- 1x A100 40GB at INT4: Handles 70B inference
- 2x RTX 3090 (parallel): Alternative at lower cost
SSH Access Pattern
# Connect to cloud instance
ssh user@instance-ip
# Port 22 or custom SSH port
# Download model
huggingface-cli download meta-llama/Meta-Llama-3-70b-Instruct
# Requires HuggingFace token with access
# Run inference
python3 -m llama_cpp_server --model models/llama-3-70b.gguf --host 0.0.0.0 --port 8080
Security Considerations
Cloud instances = external attack surface:
- Use SSH key authentication, disable password login
- Configure firewall to allow only essential ports
- Encrypt model storage at rest
- Consider VPN tunnel to instance
- Terminate instances after use to avoid charges
EXERCISE
Calculate the cost of fine-tuning Llama 3 70B in the cloud versus buying hardware. Assume 100 GPU-hours total for the task. Compare Vast.ai A100 80GB spot pricing versus purchasing an RTX 4090.