Chat templates & special tokens
After this lesson you can explain why chat formatting is a correctness issue, use the tokenizer's own chat template, and recognize the silent failure that a template mismatch causes.
An instruct model wasn't trained on bare text — it was trained on conversations formatted in a very specific way, with special tokens marking who is speaking. If you feed it a different format, you push it off the data it understands, and quality collapses. This is one of the most common silent failures in fine-tuning, and it's entirely avoidable.
Roles and special tokens
A chat is a list of messages, each with a role: system (standing instructions), user (the human), and assistant (the model). To turn that list into a single token stream, the model's training used special tokens — control tokens in the vocabulary that aren't ordinary text — to delimit each turn. Different model families use different markers; one common style looks like:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Classify: I loved it<|im_end|>
<|im_start|>assistant
positive<|im_end|>
The chat template does this for you
Every instruct tokenizer ships a chat template — a rule that turns a messages list into exactly the formatted string that model expects. You call tokenizer.apply_chat_template(messages) and get the correct string with the right special tokens, every time. The cardinal rule:
Key idea
Use the model's own chat template for both training and inference, and never hand-format. The format you train on must match the format you serve on — and both must match what the base model was trained with.
The generation prompt
At inference you pass the system + user messages and set add_generation_prompt=True. That appends the opening of the assistant turn (e.g. <|im_start|>assistant) so the model knows it's now its turn to speak and starts generating the response. During training you don't add it — the assistant turn is already present as the completion.
Where templates meet the loss mask
Combine this with the previous lesson: the chat template lays out the full conversation, and the loss mask is set so that only the assistant content (plus its end token) is a learning target. Good tooling derives the mask from the template automatically — another reason to let the template do the work rather than building strings by hand.
The silent failure
A template mismatch rarely errors. You'll see a plausible training loss and then garbage or oddly formatted outputs at inference. If a freshly fine-tuned model behaves strangely, check the chat template first — it's the usual culprit.
Key terms
- Chat template
- A tokenizer rule that converts a messages list into the exact formatted string the model expects.
- Role
- Who is speaking: system, user, or assistant.
- Special/control tokens
- Non-text tokens (e.g. turn delimiters) the model was trained with.
- apply_chat_template
- The function that renders messages into the correct formatted string.
- add_generation_prompt
- At inference, appends the assistant-turn opener so the model starts responding.
Check yourself
Answers are saved to this browser.