"Should we fine-tune?" is one of the most common — and most often wrong — first questions in an AI project. Fine-tuning is expensive, slow to iterate, and frequently solves a problem you don't have. Most needs are met by prompting or RAG, in that order of effort. Here's how to decide.
The three approaches, briefly
- Prompting — shape behaviour with instructions, examples, and structure. No training, instant iteration.
- RAG (retrieval-augmented generation) — inject relevant knowledge into the context at query time. The model stays fixed; you control what it knows.
- Fine-tuning — train the model's weights on your data to change its default behaviour or style.
Match the approach to the problem
The key question: is your problem about knowledge, behaviour, or format?
Need current or proprietary knowledge? → RAG. The model can't know your internal docs, last week's data, or customer records. Don't fine-tune facts in — they go stale and the model still hallucinates around them. Retrieve them.
Need a specific format, tone, or task structure? → Prompting first. Explicit criteria, few-shot examples, and structured output handle the large majority of "make it behave this way" needs with zero training.
Need a consistent style/behaviour that prompting can't reliably hit, at scale? → Fine-tuning. When you've genuinely exhausted prompting and need the behaviour baked in (a very specific voice, a narrow classification task at high volume, latency from shorter prompts), fine-tuning earns its cost.
A simple decision order
- Start with prompting. Cheapest, fastest. Most projects stop here.
- Add RAG when the gap is knowledge the model doesn't have.
- Consider fine-tuning only when prompting + RAG can't reach the quality/consistency bar — and you have the data and eval discipline to do it well.
They're not mutually exclusive: a strong system is often RAG + good prompting, with fine-tuning reserved for the last mile.
Cost & iteration reality
| Prompting | RAG | Fine-tuning | |
|---|---|---|---|
| Setup effort | Low | Medium | High |
| Iteration speed | Instant | Fast | Slow (retrain) |
| Keeps knowledge fresh | n/a | Yes | No (re-train) |
| Best for | Behaviour/format | Knowledge | Baked-in style/task |
The trap is treating fine-tuning as the "serious" option. In practice it's the last resort, not the first — and choosing it early usually means you'll fine-tune in stale facts you should have retrieved.
Wrap-up
Diagnose the problem before picking a tool: knowledge → RAG, behaviour/format → prompting, last-mile consistency → fine-tuning. Start cheap, add retrieval for knowledge, and only train weights when you've proven the simpler paths can't get you there.
Related reading
- RAG Systems Explained — when knowledge is the gap.
- Context Engineering — getting prompting + RAG right.
- Prompt Engineering Enterprise Guide — exhaust this before training.