Should I fine-tune, or just use a better prompt?

Reviewed May 15, 20262 min read
fine-tuningragpromptingdecision-frameworkunsloth

The answer

One paragraph. No hedging beyond what the data actually warrants.

Fine-tune ONLY when prompting + RAG can't get you there. Most people who think they need fine-tuning need a better prompt instead.

The 3-question test:

1. Can a single well-crafted prompt with examples (few-shot) get 80%+ of the result you want? If yes — stop. Iterate on the prompt. Cost: $0 to $50 in API calls or local-GPU time.

2. If your problem is "the model doesn't know my data," can you provide that data in the context? If yes — that's RAG, not fine-tuning. Build a small vector index with your docs. Cost: $0 with local embedders.

3. Does the model fail in a SPECIFIC, REPRODUCIBLE pattern that no amount of prompting fixes? Examples: wrong output format on 1 in 5 generations, refuses to follow a domain-specific style, can't classify your taxonomy correctly even with examples in context. This is the actual fine-tuning case.

Cost reality (May 2026):

  • LoRA fine-tune of a 7B model on a single H100 via RunPod: ~$15-30 for a 4-hour run
  • Full fine-tune of a 32B model on 4× H100: $300-600 minimum
  • Prompt iteration: nearly free
  • RAG infrastructure: free with local embedders

The compounding cost: a fine-tuned model loses its general capability proportional to how aggressively you fine-tuned it. If you fine-tune Llama 3.1 8B on your customer-support corpus, it becomes worse at coding, math, and general knowledge. Then you need to either keep two models in production or accept the loss. Most teams underestimate this cost.

When fine-tuning is the right answer:

  • Output format reliability (JSON, function-calling, structured extraction)
  • Domain-specific style (legal, medical, brand voice)
  • Speed: a fine-tuned 7B model can match a prompted 32B model for narrow tasks at 5× the speed
  • Cost: if you're spending $500/mo on API calls for the same task pattern, $100 of fine-tuning amortizes in 3 weeks

The honest pre-check: before any fine-tune run, write down 50 example inputs + expected outputs. If you can't, you don't have the dataset to fine-tune. If you can, run them through the un-tuned model first with the prompt-engineering pass. The gap between un-tuned and your bar tells you whether fine-tuning is even necessary.

Where we got the numbers

Fine-tune cost estimates from RunPod + Lambda Labs pricing pages May 2026. The 'fine-tuning kills general capability' observation comes from the LIMO / catastrophic-forgetting literature and r/LocalLLaMA practitioner reports.

Found this via a forum search? Bookmark the URL — we update these pages as new data lands. Have a question that should live here? Open a GitHub issue.