What is the classic signature of overfitting?

Training loss falls while validation loss flattens then rises

The simplest response to overfitting is…

Early stopping — keep the checkpoint at the validation minimum

Why isn't a low training loss enough to call a run good?

Training loss nearly always drops; the train/val relationship (and gold-set metric) is the real signal

Track 1 · SFT fundamentals · Lesson 17

Overfitting, underfitting, and reading a loss curve

After this lesson you can read training and validation loss curves together, diagnose underfitting vs overfitting vs a healthy run, and choose the right response (more data, fewer epochs, regularization, early stopping).

Level: beginner Read time: ~9 min Prerequisites: OOM and how to survive it

Training emits a stream of numbers; the loss curve is how you read the run's health at a glance. The single most informative habit you can build is plotting training loss and validation loss together and watching the gap between them.

The three shapes

Healthy: both training and validation loss fall and then flatten, staying close together. The model is learning generalizable patterns. Stop around where validation flattens.
Underfitting: both losses stay high or barely move. The model hasn't learned enough — too few epochs, learning rate too low, too little capacity (e.g. LoRA rank too small), or a data problem.
Overfitting: training loss keeps dropping while validation loss flattens and then rises. The model is memorizing the training set instead of generalizing — the gap between the two curves (the generalization gap) widens.

Solid = training loss; dashed = validation loss. The minimum of the validation curve is the checkpoint to keep; beyond it the model overfits.

Responding to overfitting

When validation turns up, you have several moves, roughly in order of preference:

Early stopping — just keep the checkpoint at the validation minimum (you saved several, per Lesson 9). The simplest fix.
Fewer epochs next run — you were training past the sweet spot.
More / more varied data — the real cure; a model overfits most easily on thin data.
More regularization — e.g. a bit more LoRA dropout or weight decay.
Less capacity — a smaller LoRA rank if it's wildly over-parameterized for the data.

Responding to underfitting

If both losses are stuck high: train longer (more epochs/steps), raise the learning rate (carefully), increase capacity (higher LoRA rank or more target modules), or check that the data and loss mask are actually correct — a broken mask or mis-formatted data shows up as a model that won't learn.

Key idea

Training loss alone can fool you — it almost always goes down. The relationship between training and validation loss is the diagnosis. And remember: the loss is a proxy; the final verdict is the gold-set metric, which the next lesson makes precise.

Key terms

Overfitting: Training loss falls while validation loss rises — the model memorizes instead of generalizing.
Underfitting: Both losses stay high; the model hasn't learned enough (too few steps, low LR, too little capacity).
Generalization gap: The distance between training and validation loss; a widening gap signals overfitting.
Early stopping: Keeping the checkpoint at the validation-loss minimum rather than the final step.
Validation loss: Loss on held-out data; the signal for generalization and checkpoint selection.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.