Track 3 · With BrewSLM · Lesson 10

Capstone B: export, deploy & Coach Mode end-to-end

After this capstone you can export a model to a deployment format, deploy a versioned endpoint with smoke and drift checks, and describe how Coach Mode guides the full lifecycle — then compare a platform run to your by-hand Track 2 run.

Level: intermediate Read time: ~11 min Prerequisites: When fine-tuning isn't the answer: auto-RAG & reroute-to-RAG

You merged and saved a model by hand in lesson 2.8 and noted that BrewSLM automates export to GGUF / ONNX / vLLM. Here's that stage — plus deployment and the assistant that's been implied behind every recommendation in this track.

10 Export — the merged artifact, productized

Export takes the trained model and a target format and produces an artifact bundle with weights, tokenizer, manifest, and a smoke-check trace. This is your merge_and_unload + save_pretrained from Track 2, with more targets and quantization tracked:

formats:        GGUF | safetensors | HF | vLLM-compat
quantization:   Q4_K_M | AWQ | GPTQ        # tracked so the Compression page
                                           # knows which variants exist
RunEvent: export (info)   payload: { format, output_path, file_size_bytes }
  on failure → export_run_failed | export_artifact_missing | export_quantization_failed

The format you pick follows Track 2's logic: safetensors/vLLM for a GPU server, GGUF + a quant like Q4_K_M for CPU/edge.

11 Deploy (optional) — a versioned, watched endpoint

Deploy takes an export and a target (a vLLM endpoint, a local runner, a cloud burst) and produces a DeploymentVersion with promote / reject / rollback / drift-check actions. Two safety mechanisms matter:

Failures are named events too: deployment_smoke_failed, deployment_drift_detected, deployment_promote_blocked.

Coach Mode — the through-line

Throughout this track, each "what next?" had an answer: review these pending rows, your gate isn't met, augment from this cluster, consider RAG. Coach Mode is the surface that emits those stage suggestions (data / cleaning / gold_set / training / eval) and offers actions — run_playbook, navigate, augment_from_cluster — so you're never staring at a project wondering what to do. It's the eleven-stage lifecycle, narrated.

Capstone B — run it through the platform

Take the same sentiment task you fine-tuned by hand and run it through BrewSLM end-to-end:

1. Create a project on SmolLM2-135M-Instruct.
2. Ingest your data (or hf:imdb) → introspect → dry-run → commit.
3. Top up with a synthetic playbook; clear the review queue.
4. Prepare → confirm the manifest's task_profile = classification.
5. Pick a LoRA recipe (r=16, lr=2e-4, 3 epochs, bf16).
6. Preflight → clear any blockers (the trainability forecast).
7. Train → watch the delta-from-baseline curve in the bell.
8. Evaluate with an eval pack (gate: accuracy >= 0.90) → read clusters.
9. If knowledge-bound failures dominate → try auto-RAG / reroute.
10. Export (GGUF Q4_K_M) → deploy → confirm the smoke check passes.

What the platform bought you

Same pipeline, same model, same data as your by-hand run — but now with per-row import accountability, a reviewed synthetic top-up, a reproducible manifest, a preflight forecast, a watched training Job, enforceable gates, clustered failures with remediation, a retrieval escape hatch, and a versioned endpoint with drift detection. You did it once by hand to understand it; the platform does it repeatably, auditably, and at scale.

That completes the BrewSLM lifecycle, end to end. You can now do this work by hand and drive the platform with full understanding of what each surface is doing underneath. Track 4 goes deeper — the advanced topics (distillation, preference tuning, quantization, multi-task) that build on this foundation.

Key terms

Export formats
GGUF / safetensors / HF / vLLM-compat — the target the artifact bundle is built for.
quantization variants
Q4_K_M / AWQ / GPTQ — compressed variants tracked so the Compression page knows what's available.
DeploymentVersion
A versioned deployment with promote / reject / rollback / drift-check actions.
drift check
A scheduled re-run of the gold set against the live endpoint to catch production regressions.
Coach Mode
The surface that emits stage suggestions and actions (run_playbook, navigate, augment_from_cluster) across the lifecycle.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.