What is Supervised Fine-Tuning?
After this lesson you can define SFT precisely, say what changes versus the base model, and decide when SFT is the right tool versus prompting, RAG, or a bigger model.
Track 0 gave you the machine and the four levers for steering it. Track 1 zooms in on one lever — Supervised Fine-Tuning — and makes every part of it precise. We start with the definition and, crucially, with when not to reach for it.
SFT in one sentence
Supervised Fine-Tuning continues training a pretrained model on a curated set of (prompt, completion) examples — inputs paired with the exact outputs you want — so the model learns to produce those outputs. "Supervised" means every example carries a known target; the model is corrected toward it. Mechanically it is the same next-token training from Track 0, run on your examples instead of raw web text.
What actually changes
SFT shifts the model's parameters so that, for inputs like yours, the next-token distribution favors the responses you demonstrated. You are not adding a module or a rulebook — you are nudging the same weights you met in Lesson 1 of Track 0. The new behavior is baked in: no per-call examples, consistent format, learned tone and task shape.
A note on starting points. A base model is raw next-token prediction; an instruct model has already been fine-tuned to follow instructions and chat. You can SFT either, but starting from an instruct model usually gets you to a polished, on-format result with less data.
Key idea
SFT changes behavior, not knowledge. It is excellent at teaching a model how to respond — format, style, task — and unreliable at teaching it new facts. For facts that change or are private, reach for RAG (Track 0, Lesson 7).
What SFT is good at
- Format and structure — always emit strict JSON, a fixed label set, spans in a schema.
- Style and tone — answer in a specific voice or length.
- A narrow task at accuracy — classification, extraction, domain QA, where a small fine-tuned model can rival a far larger one.
- Following one instruction pattern reliably, without re-explaining it every call.
When NOT to use SFT
- You need fresh or private facts → use RAG; SFT won't reliably memorize them.
- Prompting already works → don't pay for a training run you don't need.
- You have almost no data and no way to make more → too few examples to learn from (later tracks show synthetic data, distillation, and warm starts that lower the bar).
- The base genuinely lacks the capability → consider a stronger base or continued pretraining; SFT sharpens existing ability more than it creates new ability from nothing.
How much data? There is no universal number, but for a narrow task a few hundred to a few thousand clean examples is a realistic starting range — and quality matters more than raw count. The rest of this track is about getting that data right and running the training that turns it into a better model.
Key terms
- Supervised Fine-Tuning (SFT)
- Continuing training on labeled (prompt, completion) examples to change a model's behavior for a task.
- (prompt, completion) pair
- One SFT example: an input and the target output you want the model to produce.
- Base vs instruct model
- A base model is raw next-token prediction; an instruct model is already tuned to follow instructions.
- Behavior vs knowledge
- SFT reshapes how a model responds (behavior); it does not reliably add new facts (knowledge — use RAG).
Check yourself
Answers are saved to this browser.