Track 1 · SFT fundamentals · Lesson 17

Overfitting, underfitting, and reading a loss curve

After this lesson you can read training and validation loss curves together, diagnose underfitting vs overfitting vs a healthy run, and choose the right response (more data, fewer epochs, regularization, early stopping).

Level: beginner Read time: ~9 min Prerequisites: OOM and how to survive it

Training emits a stream of numbers; the loss curve is how you read the run's health at a glance. The single most informative habit you can build is plotting training loss and validation loss together and watching the gap between them.

The three shapes

loss step best checkpoint val ↑ train ↓
Solid = training loss; dashed = validation loss. The minimum of the validation curve is the checkpoint to keep; beyond it the model overfits.

Responding to overfitting

When validation turns up, you have several moves, roughly in order of preference:

Responding to underfitting

If both losses are stuck high: train longer (more epochs/steps), raise the learning rate (carefully), increase capacity (higher LoRA rank or more target modules), or check that the data and loss mask are actually correct — a broken mask or mis-formatted data shows up as a model that won't learn.

Key idea

Training loss alone can fool you — it almost always goes down. The relationship between training and validation loss is the diagnosis. And remember: the loss is a proxy; the final verdict is the gold-set metric, which the next lesson makes precise.

Key terms

Overfitting
Training loss falls while validation loss rises — the model memorizes instead of generalizing.
Underfitting
Both losses stay high; the model hasn't learned enough (too few steps, low LR, too little capacity).
Generalization gap
The distance between training and validation loss; a widening gap signals overfitting.
Early stopping
Keeping the checkpoint at the validation-loss minimum rather than the final step.
Validation loss
Loss on held-out data; the signal for generalization and checkpoint selection.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.