Run it: read the logs, the loss, the checkpoints
After this lesson you can run the training, read the Trainer's log output, recognize a healthy vs overfitting run, and find your saved checkpoints.
Run trainer.train() and the Trainer streams numbers at you. Knowing how to read them is the difference between "it ran" and "it worked." This lesson is about interpretation; you already wrote the code.
Reading the training log
With logging_steps=5 you get a line every five steps. A healthy small run looks roughly like this — the training loss falling and beginning to flatten:
{'loss': 2.41, 'learning_rate': 1.9e-04, 'epoch': 0.3}
{'loss': 1.62, 'learning_rate': 1.6e-04, 'epoch': 0.9}
{'loss': 1.04, 'learning_rate': 1.0e-04, 'epoch': 1.6}
{'loss': 0.71, 'learning_rate': 4.2e-05, 'epoch': 2.3}
{'loss': 0.58, 'learning_rate': 3.0e-06, 'epoch': 2.9}
{'eval_loss': 0.62, 'epoch': 1.0}
{'eval_loss': 0.49, 'epoch': 2.0}
{'eval_loss': 0.47, 'epoch': 3.0}
Two streams are interleaved: the per-step loss (training) and the per-epoch eval_loss (validation, because we set eval_strategy="epoch"). Read them with Track 1's loss-curve lesson in mind:
- Healthy: training loss falls and flattens;
eval_lossalso falls and levels off near the training loss. (Above.) - Overfitting: training loss keeps dropping but
eval_lossbottoms out and then rises. Keep the checkpoint from the epoch with the lowesteval_loss, and consider fewer epochs or more data next time. - Broken: loss is
nanor wildly oscillating → learning rate too high. Loss barely moves → too low, or a data/masking bug.
Watch the gap
The relationship between loss and eval_loss is the diagnosis, not either number alone. A beautiful training loss with a rising eval loss is a model memorizing your tiny dataset.
Where checkpoints land
With save_strategy="epoch", the Trainer writes a checkpoint after each epoch under your output_dir:
sft-out/
checkpoint-XX/ # one per epoch: adapter + optimizer + trainer state
checkpoint-YY/
...
Each checkpoint-* folder is a resumable snapshot. To keep the best rather than the last, you can set load_best_model_at_end=True with metric_for_best_model="eval_loss" in TrainingArguments, and the Trainer will restore the best-scoring checkpoint at the end — the early-stopping idea from Track 1, automated.
Did it work?
A falling, flattening pair of curves is necessary but not sufficient. The loss is a proxy; the real question — "is it actually better at the task than the base model?" — is answered by evaluating on the gold set, which is the next lesson.
Key idea
Logs tell you the run's health (is it learning, is it overfitting); the gold-set metric tells you the run's worth. You need both, and they answer different questions.
Key terms
- training logs
- Per-step output (loss, learning_rate, epoch) the Trainer prints.
- logging_steps
- How often training metrics are logged.
- eval_loss
- Validation loss reported per eval_strategy; the generalization signal.
- checkpoint
- A resumable snapshot saved per save_strategy under output_dir.
- load_best_model_at_end
- Restores the best-scoring checkpoint at the end (automated early stopping).
Check yourself
Answers are saved to this browser.