What does load_best_model_at_end=True do?

Restores the best-scoring checkpoint at the end (automated early stopping)

A healthy loss curve proves the model is good at the task. True?

False — the gold-set metric is the real test

Track 2 · Hands-on · Lesson 6

Run it: read the logs, the loss, the checkpoints

After this lesson you can run the training, read the Trainer's log output, recognize a healthy vs overfitting run, and find your saved checkpoints.

Level: beginner Read time: ~8 min Prerequisites: A minimal LoRA fine-tune with the Trainer

Run trainer.train() and the Trainer streams numbers at you. Knowing how to read them is the difference between "it ran" and "it worked." This lesson is about interpretation; you already wrote the code.

Reading the training log

With logging_steps=5 you get a line every five steps. A healthy small run looks roughly like this — the training loss falling and beginning to flatten:

{'loss': 2.41, 'learning_rate': 1.9e-04, 'epoch': 0.3}
{'loss': 1.62, 'learning_rate': 1.6e-04, 'epoch': 0.9}
{'loss': 1.04, 'learning_rate': 1.0e-04, 'epoch': 1.6}
{'loss': 0.71, 'learning_rate': 4.2e-05, 'epoch': 2.3}
{'loss': 0.58, 'learning_rate': 3.0e-06, 'epoch': 2.9}
{'eval_loss': 0.62, 'epoch': 1.0}
{'eval_loss': 0.49, 'epoch': 2.0}
{'eval_loss': 0.47, 'epoch': 3.0}

Two streams are interleaved: the per-step loss (training) and the per-epoch eval_loss (validation, because we set eval_strategy="epoch"). Read them with Track 1's loss-curve lesson in mind:

Healthy: training loss falls and flattens; eval_loss also falls and levels off near the training loss. (Above.)
Overfitting: training loss keeps dropping but eval_loss bottoms out and then rises. Keep the checkpoint from the epoch with the lowest eval_loss, and consider fewer epochs or more data next time.
Broken: loss is nan or wildly oscillating → learning rate too high. Loss barely moves → too low, or a data/masking bug.

Watch the gap

The relationship between loss and eval_loss is the diagnosis, not either number alone. A beautiful training loss with a rising eval loss is a model memorizing your tiny dataset.

Where checkpoints land

With save_strategy="epoch", the Trainer writes a checkpoint after each epoch under your output_dir:

sft-out/
  checkpoint-XX/        # one per epoch: adapter + optimizer + trainer state
  checkpoint-YY/
  ...

Each checkpoint-* folder is a resumable snapshot. To keep the best rather than the last, you can set load_best_model_at_end=True with metric_for_best_model="eval_loss" in TrainingArguments, and the Trainer will restore the best-scoring checkpoint at the end — the early-stopping idea from Track 1, automated.

Did it work?

A falling, flattening pair of curves is necessary but not sufficient. The loss is a proxy; the real question — "is it actually better at the task than the base model?" — is answered by evaluating on the gold set, which is the next lesson.

Key idea

Logs tell you the run's health (is it learning, is it overfitting); the gold-set metric tells you the run's worth. You need both, and they answer different questions.

Key terms

training logs: Per-step output (loss, learning_rate, epoch) the Trainer prints.
logging_steps: How often training metrics are logged.
eval_loss: Validation loss reported per eval_strategy; the generalization signal.
checkpoint: A resumable snapshot saved per save_strategy under output_dir.
load_best_model_at_end: Restores the best-scoring checkpoint at the end (automated early stopping).

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.

Reading the training log

Where checkpoints land

Did it work?

Key terms

Check yourself

Related lessons