Track 3 · With BrewSLM · Lesson 7

Train: jobs, the bell, and the delta-from-baseline curve

After this lesson you can launch training as a background Job, monitor it via the notification bell and the live loss curve, read the delta-from-baseline view, and recognize the named training failure events.

Level: intermediate Read time: ~9 min Prerequisites: Preflight and the trainability forecast

This is the stage you know best — it's trainer.train() from lesson 2.5. The difference is everything around the loop: it runs as a tracked Job, streams progress to a bell, and plots its loss against your baseline.

Launch — one click or one call

From the Training Config page you launch with the chosen recipe; or scripted:

$ curl -X POST localhost:8000/api/projects/1/training/run \
    -H 'Content-Type: application/json' \
    -d '{"autopilot": true, "one_click": true}'

Training takes the prepared manifest + chosen recipe + checkpoint cadence, and produces checkpoints, per-step loss / eval traces, and a final adapter weights blob — the same outputs your Trainer wrote to sft-out/, now first-class records on the experiment.

It runs as a background Job

Long-running work in BrewSLM is a Job persisted to a table. The top-bar notification bell polls /api/jobs/active every ~4 seconds and surfaces progress and outcome, so you don't sit watching a terminal. A watcher Job mirrors the experiment's progress into the bell, and the Job framework holds a strong reference to the running task so it can't be silently garbage-collected mid-run.

The delta-from-baseline curve

In Track 2 you read raw loss numbers scrolling past. BrewSLM plots the live loss curve as a delta from baseline — the untuned base model's loss on the same data is the zero line, and your run is drawn relative to it. That reframes the question from "is 0.58 good?" (which you can't judge in isolation) to "am I beating the baseline, and by how much?" — exactly the comparison Track 1's evaluation lesson insisted on, now live during training.

From Track 2

Everything you learned about reading curves (lesson 2.6) applies here — falling-and-flattening is healthy, a rising eval trace is overfitting. The delta-from-baseline view just makes "better than doing nothing" the explicit y-axis.

Completion and failure are events

When the run finishes it emits training / info with a payload of experiment_id, backend, and final_train_loss. When it fails, it emits a named event so you know exactly what happened:

training            (info)     # completed: experiment_id, backend, final_train_loss
training_oom                   # ran out of GPU memory
training_runtime_error         # crashed mid-run
training_timeout               # exceeded the time budget
training_cancelled             # you cancelled it

Those map directly onto the failure modes you learned to diagnose by hand — but now they're typed, logged, and visible in the failure-cluster surface rather than buried in a stack trace. With a trained adapter and its traces recorded, the next stage asks the real question: is it good enough to ship? That's evaluation.

Key idea

Stage 08 is your training loop, watched: a background Job with bell progress, a live delta-from-baseline curve that makes 'beating the base' the y-axis, and named RunEvents for completion and every failure mode.

Key terms

training Job
The background, persisted task that runs stage 08; tracked so it can't be GC'd mid-run.
NotificationBell
Top-bar surface polling /api/jobs/active (~every 4s) for progress and outcome.
delta-from-baseline curve
The live loss plotted relative to the untuned base model's loss as the zero line.
training RunEvents
Named events: training (info) on completion; training_oom / _runtime_error / _timeout / _cancelled on failure.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.