Track 1 · SFT fundamentals · Lesson 4

Chat templates & special tokens

After this lesson you can explain why chat formatting is a correctness issue, use the tokenizer's own chat template, and recognize the silent failure that a template mismatch causes.

Level: beginner Read time: ~9 min Prerequisites: Anatomy of an SFT example: prompt, completion, and the loss mask

An instruct model wasn't trained on bare text — it was trained on conversations formatted in a very specific way, with special tokens marking who is speaking. If you feed it a different format, you push it off the data it understands, and quality collapses. This is one of the most common silent failures in fine-tuning, and it's entirely avoidable.

Roles and special tokens

A chat is a list of messages, each with a role: system (standing instructions), user (the human), and assistant (the model). To turn that list into a single token stream, the model's training used special tokens — control tokens in the vocabulary that aren't ordinary text — to delimit each turn. Different model families use different markers; one common style looks like:

<|im_start|>system

You are a helpful assistant.<|im_end|>

<|im_start|>user

Classify: I loved it<|im_end|>

<|im_start|>assistant

positive<|im_end|>

The chat template does this for you

Every instruct tokenizer ships a chat template — a rule that turns a messages list into exactly the formatted string that model expects. You call tokenizer.apply_chat_template(messages) and get the correct string with the right special tokens, every time. The cardinal rule:

Key idea

Use the model's own chat template for both training and inference, and never hand-format. The format you train on must match the format you serve on — and both must match what the base model was trained with.

The generation prompt

At inference you pass the system + user messages and set add_generation_prompt=True. That appends the opening of the assistant turn (e.g. <|im_start|>assistant) so the model knows it's now its turn to speak and starts generating the response. During training you don't add it — the assistant turn is already present as the completion.

Where templates meet the loss mask

Combine this with the previous lesson: the chat template lays out the full conversation, and the loss mask is set so that only the assistant content (plus its end token) is a learning target. Good tooling derives the mask from the template automatically — another reason to let the template do the work rather than building strings by hand.

The silent failure

A template mismatch rarely errors. You'll see a plausible training loss and then garbage or oddly formatted outputs at inference. If a freshly fine-tuned model behaves strangely, check the chat template first — it's the usual culprit.

Key terms

Chat template
A tokenizer rule that converts a messages list into the exact formatted string the model expects.
Role
Who is speaking: system, user, or assistant.
Special/control tokens
Non-text tokens (e.g. turn delimiters) the model was trained with.
apply_chat_template
The function that renders messages into the correct formatted string.
add_generation_prompt
At inference, appends the assistant-turn opener so the model starts responding.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.