In strict mode, a blocker…

Prevents the run from dispatching until cleared

Why is a preflight blocker cheaper than discovering the problem during training?

It catches the issue in seconds before spending a GPU-hour

Track 3 · With BrewSLM · Lesson 6

Preflight and the trainability forecast

After this lesson you can explain what preflight checks, the difference between strict and fail-open modes, and how the trainability forecast catches memory and dependency problems before training starts instead of after it crashes.

Level: intermediate Read time: ~9 min Prerequisites: Recipes and task handlers: the training config without the code

In Track 2, two whole lessons existed because of things that bite at runtime: getting the environment right (2.1) and surviving OOM (Track 1.16, and the memory math of 1.15). You found those problems by hitting them. BrewSLM's preflight stage finds them before the run starts.

What preflight checks

Preflight takes the prepared manifest, the base model, the runtime profile, and the hardware, and runs four families of checks:

Dependency — are the required runtimes/libraries present for this recipe and handler? (The DPO objective needs a preference-training dependency; a missing one is a blocker, not a mid-run crash.)
Memory-fit — will this model + batch + sequence length + precision actually fit in the available VRAM? This is the GPU memory math from Track 1.15, computed for you.
Capability — can the chosen backend/handler do what the task needs?
Gate-policy — are the promotion gates well-formed and satisfiable?

Its output is a pass/fail verdict, a blocker list, and a generated train plan — the concrete, runnable plan assembled from your recipe + manifest.

$ preflight: project 1, recipe lora-r16, base SmolLM2-135M-Instruct

memory-fit ........ PASS   (est 3.1 GB / 24 GB available)
dependency ........ PASS
capability ........ PASS
gate-policy ....... PASS
verdict: GO  →  train plan generated (lora r=16, 3 epochs, eff. batch 16)

# a failing preflight instead surfaces the blockers:
#   memory-fit ..... BLOCK  (est 27 GB > 24 GB) → lower batch or add grad-accum
#   dependency ..... BLOCK  (DPO objective needs the preference-training extra)

This is the trainability forecast

Surfaced in the UI, preflight is the trainability forecast: a green light to launch, or a specific list of what to fix first. Instead of "start training, wait, watch it OOM at step 1, lower the batch, repeat," you get the verdict in seconds. It's the Track 1 memory lesson and the Track 2 environment lesson, turned into a gate.

From Track 2

Remember manually estimating whether a run would fit, then adding gradient accumulation or checkpointing when it didn't? Preflight's memory-fit check does that estimate and tells you up front — and a blocker here is far cheaper than an OOM an hour into training.

Strict vs fail-open

Two modes govern what a blocker does. Strict mode blocks: the run won't dispatch until blockers are cleared — right for production and for not wasting a long GPU booking. Fail-open mode emits warnings and lets you proceed anyway — useful when you know better than the check. When a blocker is fatal under strict mode, it surfaces as a training / training_dispatch_error RunEvent, so even a refusal to launch is in the audit trail.

Key idea

Preflight is the forecast you never ran by hand: dependency, memory-fit, capability, and gate checks that return a verdict + blocker list + train plan before spending compute. Strict mode blocks; fail-open warns. Catch the OOM on paper, not at step one.

Key terms

Preflight: Stage 07: pre-run checks returning pass/fail, a blocker list, and a generated train plan.
trainability forecast: The UI surfacing of preflight — a go/no-go verdict with the specific things to fix.
memory-fit check: Computes whether model + batch + seq length + precision fit in VRAM (the Track 1 memory math).
strict vs fail-open: Strict blocks on any blocker; fail-open emits warnings and proceeds.
train plan: The concrete runnable plan preflight generates from the recipe + manifest.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.