Track 3 · With BrewSLM · Lesson 6

Preflight and the trainability forecast

After this lesson you can explain what preflight checks, the difference between strict and fail-open modes, and how the trainability forecast catches memory and dependency problems before training starts instead of after it crashes.

Level: intermediate Read time: ~9 min Prerequisites: Recipes and task handlers: the training config without the code

In Track 2, two whole lessons existed because of things that bite at runtime: getting the environment right (2.1) and surviving OOM (Track 1.16, and the memory math of 1.15). You found those problems by hitting them. BrewSLM's preflight stage finds them before the run starts.

What preflight checks

Preflight takes the prepared manifest, the base model, the runtime profile, and the hardware, and runs four families of checks:

Its output is a pass/fail verdict, a blocker list, and a generated train plan — the concrete, runnable plan assembled from your recipe + manifest.

$ preflight: project 1, recipe lora-r16, base SmolLM2-135M-Instruct

memory-fit ........ PASS   (est 3.1 GB / 24 GB available)
dependency ........ PASS
capability ........ PASS
gate-policy ....... PASS
verdict: GO  →  train plan generated (lora r=16, 3 epochs, eff. batch 16)

# a failing preflight instead surfaces the blockers:
#   memory-fit ..... BLOCK  (est 27 GB > 24 GB) → lower batch or add grad-accum
#   dependency ..... BLOCK  (DPO objective needs the preference-training extra)

This is the trainability forecast

Surfaced in the UI, preflight is the trainability forecast: a green light to launch, or a specific list of what to fix first. Instead of "start training, wait, watch it OOM at step 1, lower the batch, repeat," you get the verdict in seconds. It's the Track 1 memory lesson and the Track 2 environment lesson, turned into a gate.

From Track 2

Remember manually estimating whether a run would fit, then adding gradient accumulation or checkpointing when it didn't? Preflight's memory-fit check does that estimate and tells you up front — and a blocker here is far cheaper than an OOM an hour into training.

Strict vs fail-open

Two modes govern what a blocker does. Strict mode blocks: the run won't dispatch until blockers are cleared — right for production and for not wasting a long GPU booking. Fail-open mode emits warnings and lets you proceed anyway — useful when you know better than the check. When a blocker is fatal under strict mode, it surfaces as a training / training_dispatch_error RunEvent, so even a refusal to launch is in the audit trail.

Key idea

Preflight is the forecast you never ran by hand: dependency, memory-fit, capability, and gate checks that return a verdict + blocker list + train plan before spending compute. Strict mode blocks; fail-open warns. Catch the OOM on paper, not at step one.

Key terms

Preflight
Stage 07: pre-run checks returning pass/fail, a blocker list, and a generated train plan.
trainability forecast
The UI surfacing of preflight — a go/no-go verdict with the specific things to fix.
memory-fit check
Computes whether model + batch + seq length + precision fit in VRAM (the Track 1 memory math).
strict vs fail-open
Strict blocks on any blocker; fail-open emits warnings and proceeds.
train plan
The concrete runnable plan preflight generates from the recipe + manifest.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.