Track 2 · Hands-on · Lesson 6

Run it: read the logs, the loss, the checkpoints

After this lesson you can run the training, read the Trainer's log output, recognize a healthy vs overfitting run, and find your saved checkpoints.

Level: beginner Read time: ~8 min Prerequisites: A minimal LoRA fine-tune with the Trainer

Run trainer.train() and the Trainer streams numbers at you. Knowing how to read them is the difference between "it ran" and "it worked." This lesson is about interpretation; you already wrote the code.

Reading the training log

With logging_steps=5 you get a line every five steps. A healthy small run looks roughly like this — the training loss falling and beginning to flatten:

{'loss': 2.41, 'learning_rate': 1.9e-04, 'epoch': 0.3}
{'loss': 1.62, 'learning_rate': 1.6e-04, 'epoch': 0.9}
{'loss': 1.04, 'learning_rate': 1.0e-04, 'epoch': 1.6}
{'loss': 0.71, 'learning_rate': 4.2e-05, 'epoch': 2.3}
{'loss': 0.58, 'learning_rate': 3.0e-06, 'epoch': 2.9}
{'eval_loss': 0.62, 'epoch': 1.0}
{'eval_loss': 0.49, 'epoch': 2.0}
{'eval_loss': 0.47, 'epoch': 3.0}

Two streams are interleaved: the per-step loss (training) and the per-epoch eval_loss (validation, because we set eval_strategy="epoch"). Read them with Track 1's loss-curve lesson in mind:

Watch the gap

The relationship between loss and eval_loss is the diagnosis, not either number alone. A beautiful training loss with a rising eval loss is a model memorizing your tiny dataset.

Where checkpoints land

With save_strategy="epoch", the Trainer writes a checkpoint after each epoch under your output_dir:

sft-out/
  checkpoint-XX/        # one per epoch: adapter + optimizer + trainer state
  checkpoint-YY/
  ...

Each checkpoint-* folder is a resumable snapshot. To keep the best rather than the last, you can set load_best_model_at_end=True with metric_for_best_model="eval_loss" in TrainingArguments, and the Trainer will restore the best-scoring checkpoint at the end — the early-stopping idea from Track 1, automated.

Did it work?

A falling, flattening pair of curves is necessary but not sufficient. The loss is a proxy; the real question — "is it actually better at the task than the base model?" — is answered by evaluating on the gold set, which is the next lesson.

Key idea

Logs tell you the run's health (is it learning, is it overfitting); the gold-set metric tells you the run's worth. You need both, and they answer different questions.

Key terms

training logs
Per-step output (loss, learning_rate, epoch) the Trainer prints.
logging_steps
How often training metrics are logged.
eval_loss
Validation loss reported per eval_strategy; the generalization signal.
checkpoint
A resumable snapshot saved per save_strategy under output_dir.
load_best_model_at_end
Restores the best-scoring checkpoint at the end (automated early stopping).

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.