SFT primarily changes a model's…

Behavior (how it responds), via its parameters

Which problem is the WRONG fit for SFT?

Recall today's private, frequently-changing inventory facts

Why often start from an instruct model rather than a base model?

It already follows instructions, so you reach a polished result with less data

What matters most for SFT data?

Quality and representativeness, not just count

Track 1 · SFT fundamentals · Lesson 1

What is Supervised Fine-Tuning?

After this lesson you can define SFT precisely, say what changes versus the base model, and decide when SFT is the right tool versus prompting, RAG, or a bigger model.

Level: beginner Read time: ~9 min Prerequisites: Architecture taxonomy

Track 0 gave you the machine and the four levers for steering it. Track 1 zooms in on one lever — Supervised Fine-Tuning — and makes every part of it precise. We start with the definition and, crucially, with when not to reach for it.

SFT in one sentence

Supervised Fine-Tuning continues training a pretrained model on a curated set of (prompt, completion) examples — inputs paired with the exact outputs you want — so the model learns to produce those outputs. "Supervised" means every example carries a known target; the model is corrected toward it. Mechanically it is the same next-token training from Track 0, run on your examples instead of raw web text.

What actually changes

SFT shifts the model's parameters so that, for inputs like yours, the next-token distribution favors the responses you demonstrated. You are not adding a module or a rulebook — you are nudging the same weights you met in Lesson 1 of Track 0. The new behavior is baked in: no per-call examples, consistent format, learned tone and task shape.

A note on starting points. A base model is raw next-token prediction; an instruct model has already been fine-tuned to follow instructions and chat. You can SFT either, but starting from an instruct model usually gets you to a polished, on-format result with less data.

Key idea

SFT changes behavior, not knowledge. It is excellent at teaching a model how to respond — format, style, task — and unreliable at teaching it new facts. For facts that change or are private, reach for RAG (Track 0, Lesson 7).

What SFT is good at

Format and structure — always emit strict JSON, a fixed label set, spans in a schema.
Style and tone — answer in a specific voice or length.
A narrow task at accuracy — classification, extraction, domain QA, where a small fine-tuned model can rival a far larger one.
Following one instruction pattern reliably, without re-explaining it every call.

When NOT to use SFT

You need fresh or private facts → use RAG; SFT won't reliably memorize them.
Prompting already works → don't pay for a training run you don't need.
You have almost no data and no way to make more → too few examples to learn from (later tracks show synthetic data, distillation, and warm starts that lower the bar).
The base genuinely lacks the capability → consider a stronger base or continued pretraining; SFT sharpens existing ability more than it creates new ability from nothing.

How much data? There is no universal number, but for a narrow task a few hundred to a few thousand clean examples is a realistic starting range — and quality matters more than raw count. The rest of this track is about getting that data right and running the training that turns it into a better model.

Key terms

Supervised Fine-Tuning (SFT): Continuing training on labeled (prompt, completion) examples to change a model's behavior for a task.
(prompt, completion) pair: One SFT example: an input and the target output you want the model to produce.
Base vs instruct model: A base model is raw next-token prediction; an instruct model is already tuned to follow instructions.
Behavior vs knowledge: SFT reshapes how a model responds (behavior); it does not reliably add new facts (knowledge — use RAG).

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.