Track 0 · Foundations · Lesson 10

Base vs instruct models

After this lesson you can explain the difference between a base model and an instruct model, why one of them already speaks in chat turns and the other doesn't, and why you almost always fine-tune the instruct version.

Level: beginner Read time: ~8 min Prerequisites: Next-token prediction

Open Hugging Face and look at any modern model family — SmolLM2, Qwen, Llama, Mistral, Phi. You'll see two flavours: a base version and an instruct (or chat) version. Same architecture, same parameter count. Very different behaviour. Pick the wrong one for your fine-tune and you waste budget teaching the model something it should have started with.

What "base" means

A base model is what you get straight out of pretraining (Lesson 0.6, next-token prediction). It has been trained on enormous piles of text — books, code, the web — with one objective: given some tokens, predict the next one. That's it. A base model knows a lot of facts, has internalised a lot of patterns, and is a strong next-token continuer.

What it is not: a chatbot. Give a base model the input "What is the capital of France?" and it doesn't necessarily answer "Paris." It might continue with another question, or a textbook paragraph, or a multiple-choice list. It is doing exactly what it was trained to do — continue the text plausibly. It has no concept of "user" and "assistant," no concept of refusing, and no instinct to stop at the end of an answer.

What "instruct" means

An instruct model starts from the base model and goes through one or more rounds of alignment training on top: supervised fine-tuning (SFT) on instruction–response pairs, often followed by preference tuning like DPO or RLHF (Track 4, Lesson 4.5). The point of that extra training is to teach behaviours the raw next-token objective never asked for:

Same parameters, retrained. The result behaves like the chatbot you expected — because someone else paid the cost of teaching it to behave like one.

The chat template is for instruct models

Instruct models are trained inside a chat template: a tokenizer rule that wraps each message with role markers like <|im_start|>user and <|im_start|>assistant. The model learned, during alignment training, that those exact special tokens mean "now it's the assistant's turn." Feed an instruct model raw text without its chat template and it gets confused; feed it the template and it produces clean assistant replies.

Base models don't have a chat template — there is no assistant turn for them to fill, because they were never taught the concept. They take raw text and continue it.

Which one should you fine-tune? Almost always: instruct

If your task is "answer this kind of question in this format" — classification, extraction, structured output, tone adjustment, any normal SLM use case — fine-tune the instruct version. Reasoning:

Key idea

An instruct model is a base model plus someone else's alignment training. Fine-tuning the instruct version means you start from a model that already knows how to follow instructions, and you only have to teach it your task. That's the entire reason it exists.

When base is the right choice

Three honest exceptions where you reach for the base model instead:

The alignment tax

The instruct version comes with someone else's choices baked in: refusal patterns, "as a language model" hedging, RLHF preferences, a particular style. Those choices travel with the model. When they help, great. When they fight your task, you push against them — sometimes successfully with more data, sometimes not. That cost is sometimes called the alignment tax, and it's worth knowing it exists before you pick a starting point.

Most of the time, on most tasks, the instruct version is the right starting point and the alignment tax is small. The next lesson is about everything else that goes into picking the specific base model — license, tokenizer, context length, size.

Key terms

Base model
The model straight from pretraining: a next-token continuer over raw text. No chat template, no roles, no refusals.
Instruct model
A base model further trained (SFT + often preference tuning) to follow instructions inside a chat template. Same architecture, different behaviour.
Alignment training
The post-pretraining training (SFT on instruction data plus optional DPO/RLHF) that turns a base into an instruct model.
Chat template
The tokenizer rule wrapping messages with role markers (<|im_start|>user, …). Instruct models expect it; base models don't.
Continued pretraining
Continuing the next-token objective on new raw text to teach a base model a new domain — not the same as instruction tuning.
Alignment tax
The cost of the instruct model's baked-in choices (refusals, hedges, style) when they don't match your task.

Check yourself

Four questions. Answers are saved to this browser.

Progress is stored locally in your browser.