Should I fine-tune, or just use a better prompt?

Reviewed May 15, 20262 min read

fine-tuningragpromptingdecision-frameworkunsloth

The answer

One paragraph. No hedging beyond what the data actually warrants.

Fine-tune ONLY when prompting + RAG can't get you there. Most people who think they need fine-tuning need a better prompt instead.

The 3-question test:

1. Can a single well-crafted prompt with examples (few-shot) get 80%+ of the result you want? If yes — stop. Iterate on the prompt. Cost: $0 to $50 in API calls or local-GPU time.

2. If your problem is "the model doesn't know my data," can you provide that data in the context? If yes — that's RAG, not fine-tuning. Build a small vector index with your docs. Cost: $0 with local embedders.

3. Does the model fail in a SPECIFIC, REPRODUCIBLE pattern that no amount of prompting fixes? Examples: wrong output format on 1 in 5 generations, refuses to follow a domain-specific style, can't classify your taxonomy correctly even with examples in context. This is the actual fine-tuning case.

Cost reality (May 2026):

LoRA fine-tune of a 7B model on a single H100 via RunPod: ~$15-30 for a 4-hour run
Full fine-tune of a 32B model on 4× H100: $300-600 minimum
Prompt iteration: nearly free
RAG infrastructure: free with local embedders

The compounding cost: a fine-tuned model loses its general capability proportional to how aggressively you fine-tuned it. If you fine-tune Llama 3.1 8B on your customer-support corpus, it becomes worse at coding, math, and general knowledge. Then you need to either keep two models in production or accept the loss. Most teams underestimate this cost.

When fine-tuning is the right answer:

Output format reliability (JSON, function-calling, structured extraction)
Domain-specific style (legal, medical, brand voice)
Speed: a fine-tuned 7B model can match a prompted 32B model for narrow tasks at 5× the speed
Cost: if you're spending $500/mo on API calls for the same task pattern, $100 of fine-tuning amortizes in 3 weeks

The honest pre-check: before any fine-tune run, write down 50 example inputs + expected outputs. If you can't, you don't have the dataset to fine-tune. If you can, run them through the un-tuned model first with the prompt-engineering pass. The gap between un-tuned and your bar tells you whether fine-tuning is even necessary.

Explore the numbers for your specific stack

Open the cost calculator →

Plug your monthly API spend in to see when fine-tuning amortizes vs continuing to prompt cloud models.

Where we got the numbers

Fine-tune cost estimates from RunPod + Lambda Labs pricing pages May 2026. The 'fine-tuning kills general capability' observation comes from the LIMO / catastrophic-forgetting literature and r/LocalLLaMA practitioner reports.

Also see

RAG apps directory →

The path that usually beats fine-tuning for 'the model doesn't know my data' problems.

Coding agents (prompting > fine-tuning) →

Coding agents are an example where good prompting + the right base model beats most fine-tunes.

Runtimes (Unsloth is the practical fine-tuner) →

If you've decided to fine-tune, Unsloth is the operator-grade pick. Apache 2.0, NVIDIA-only.

Total cost calculator →

TCO including fine-tune training + serving costs vs API spend over 12 months.

Should I fine-tune, or just use a better prompt?

The answer

Explore the numbers for your specific stack

Where we got the numbers

Also see

Other questions in this thread