Which model is the result of further training a base model to follow instructions?

The instruct (or instruction-tuned) model.

Why is the instruct model the usual starting point for SFT?

It already follows instructions and respects a chat template, so the fine-tune only has to teach your task, not instruction-following itself.

Track 0 · Foundations · Lesson 10

Base vs instruct models

After this lesson you can explain the difference between a base model and an instruct model, why one of them already speaks in chat turns and the other doesn't, and why you almost always fine-tune the instruct version.

Level: beginner Read time: ~8 min Prerequisites: Next-token prediction

Open Hugging Face and look at any modern model family — SmolLM2, Qwen, Llama, Mistral, Phi. You'll see two flavours: a base version and an instruct (or chat) version. Same architecture, same parameter count. Very different behaviour. Pick the wrong one for your fine-tune and you waste budget teaching the model something it should have started with.

What "base" means

A base model is what you get straight out of pretraining (Lesson 0.6, next-token prediction). It has been trained on enormous piles of text — books, code, the web — with one objective: given some tokens, predict the next one. That's it. A base model knows a lot of facts, has internalised a lot of patterns, and is a strong next-token continuer.

What it is not: a chatbot. Give a base model the input "What is the capital of France?" and it doesn't necessarily answer "Paris." It might continue with another question, or a textbook paragraph, or a multiple-choice list. It is doing exactly what it was trained to do — continue the text plausibly. It has no concept of "user" and "assistant," no concept of refusing, and no instinct to stop at the end of an answer.

What "instruct" means

An instruct model starts from the base model and goes through one or more rounds of alignment training on top: supervised fine-tuning (SFT) on instruction–response pairs, often followed by preference tuning like DPO or RLHF (Track 4, Lesson 4.5). The point of that extra training is to teach behaviours the raw next-token objective never asked for:

Treat the input as a question or instruction, and produce a direct answer.
Respect roles — there is a "user" and an "assistant," and the assistant only speaks in its turn.
Stop when the answer is done.
Refuse certain kinds of requests.
Use a particular style and tone.

Same parameters, retrained. The result behaves like the chatbot you expected — because someone else paid the cost of teaching it to behave like one.

The chat template is for instruct models

Instruct models are trained inside a chat template: a tokenizer rule that wraps each message with role markers like <|im_start|>user and <|im_start|>assistant. The model learned, during alignment training, that those exact special tokens mean "now it's the assistant's turn." Feed an instruct model raw text without its chat template and it gets confused; feed it the template and it produces clean assistant replies.

Base models don't have a chat template — there is no assistant turn for them to fill, because they were never taught the concept. They take raw text and continue it.

Which one should you fine-tune? Almost always: instruct

If your task is "answer this kind of question in this format" — classification, extraction, structured output, tone adjustment, any normal SLM use case — fine-tune the instruct version. Reasoning:

The instruct model already follows instructions and respects the chat template. Your fine-tune only has to teach your task.
SFT a base model and your data must teach instruction-following and your task at the same time. That's a much harder problem on a much smaller budget.
You are standing on the shoulders of whoever did the alignment work. Use them.

Key idea

An instruct model is a base model plus someone else's alignment training. Fine-tuning the instruct version means you start from a model that already knows how to follow instructions, and you only have to teach it your task. That's the entire reason it exists.

When base is the right choice

Three honest exceptions where you reach for the base model instead:

Continued pretraining. If your goal is to teach a model a whole new domain — internal codebase, legal corpus, medical literature — by training on raw text in that domain, you start from the base. Continued pretraining is "more pretraining," not instruction tuning.
Building your own instruct from scratch. If you want full control over the alignment — different refusal style, different tone, different objective — and you have enough data and compute to redo that training, you start from base.
The shipped alignment fights your task. If the instruct model's alignment refuses things you legitimately need, or hedges where you need it decisive, fine-tuning may not undo it. Sometimes starting from base + your own SFT is the cleanest path.

The alignment tax

The instruct version comes with someone else's choices baked in: refusal patterns, "as a language model" hedging, RLHF preferences, a particular style. Those choices travel with the model. When they help, great. When they fight your task, you push against them — sometimes successfully with more data, sometimes not. That cost is sometimes called the alignment tax, and it's worth knowing it exists before you pick a starting point.

Most of the time, on most tasks, the instruct version is the right starting point and the alignment tax is small. The next lesson is about everything else that goes into picking the specific base model — license, tokenizer, context length, size.

Key terms

Base model: The model straight from pretraining: a next-token continuer over raw text. No chat template, no roles, no refusals.
Instruct model: A base model further trained (SFT + often preference tuning) to follow instructions inside a chat template. Same architecture, different behaviour.
Alignment training: The post-pretraining training (SFT on instruction data plus optional DPO/RLHF) that turns a base into an instruct model.
Chat template: The tokenizer rule wrapping messages with role markers (<|im_start|>user, …). Instruct models expect it; base models don't.
Continued pretraining: Continuing the next-token objective on new raw text to teach a base model a new domain — not the same as instruction tuning.
Alignment tax: The cost of the instruct model's baked-in choices (refusals, hedges, style) when they don't match your task.

Check yourself

Four questions. Answers are saved to this browser.

Progress is stored locally in your browser.