RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Hardware Planning for Local AI
  6. /Ch. 12
Hardware Planning for Local AI

12. Cloud GPU Fallback

Chapter 12 of 20 · 20 min
KEY INSIGHT

Cloud GPUs offer cost-effective capacity for occasional heavy workloads—calculate break-even against purchase cost before committing to either path. ```bash # Test connection to cloud llama.cpp instance curl -X POST http://instance-ip:8080/completion \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, explain AI", "max_tokens": 100}' # Monitor usage to calculate monthly cost watch -n 60 nvidia-smi ```

Cloud GPU instances provide elastic capacity for workloads that exceed local hardware. Understanding when and how to use cloud resources completes a hardware strategy.

Cloud GPU Pricing

Provider GPU VRAM Price/hr Price/hr (spot)
Vast.ai RTX 3080 10GB $0.20-0.35 $0.10-0.20
Vast.ai RTX 3090 24GB $0.40-0.60 $0.20-0.35
Vast.ai A100 40GB 40GB $1.50-2.50 $0.80-1.20
Lambda Labs A100 80GB 80GB $3.00+ $1.50+
AWS p4d A100 40GB 40GB $3.67 N/A

Prices vary by region and demand. Vast.ai typically offers best cost efficiency for short-term needs.

Break-Even Calculation

When does buying vs. renting make sense?

Monthly GPU cost (purchase): ($1500 GPU / 36 months) + ($50 electricity) = $92/month
Equivalent cloud usage: $92 / $0.30/hr = 307 hours/month = 12.8 hrs/day

At typical usage patterns, purchasing makes sense above 8 hours/day of active inference.

Cloud Instance Selection

For 70B model fine-tuning:

  • 1x A100 80GB: Required for 70B QLoRA
  • 8x A100 40GB: Required for 70B full fine-tuning
  • Spot instances: 40-60% savings with interruption risk

For inference only (70B):

  • 1x A100 40GB at INT4: Handles 70B inference
  • 2x RTX 3090 (parallel): Alternative at lower cost

SSH Access Pattern

# Connect to cloud instance
ssh user@instance-ip
# Port 22 or custom SSH port

# Download model
huggingface-cli download meta-llama/Meta-Llama-3-70b-Instruct
# Requires HuggingFace token with access

# Run inference
python3 -m llama_cpp_server --model models/llama-3-70b.gguf --host 0.0.0.0 --port 8080

Security Considerations

Cloud instances = external attack surface:

  • Use SSH key authentication, disable password login
  • Configure firewall to allow only essential ports
  • Encrypt model storage at rest
  • Consider VPN tunnel to instance
  • Terminate instances after use to avoid charges
EXERCISE

Calculate the cost of fine-tuning Llama 3 70B in the cloud versus buying hardware. Assume 100 GPU-hours total for the task. Compare Vast.ai A100 80GB spot pricing versus purchasing an RTX 4090.

← Chapter 11
External GPU Enclosures
Chapter 13 →
Budget Build: Entry-Level Under $500