With BrewSLM
Re-run the by-hand pipeline through BrewSLM's eleven-stage lifecycle, mapping each step to its platform surface — import, recipes, the trainability forecast, training jobs, eval packs, auto-RAG, and deployment.
-
1. From script to platform: the BrewSLM lifecycle
You fine-tuned a model by hand in Track 2. This track re-runs that exact pipeline through BrewSLM, mapping each by-hand step to a platform surface. It opens with the eleven-stage lifecycle and why every stage emits a RunEvent.
-
2. Ingest & map your data with per-row accountability
Stages 01–04 of the BrewSLM lifecycle: ingest from a source locator, introspect the data's shape into a ranked mapping proposal, dry-run the mapping to preview accepted and rejected rows, and commit — replacing the hand-built dataset list from Track 2 with an auditable import.
-
3. Synthetic data and the review queue
When you don't have enough rows, BrewSLM generates them with recipe-aware synthetic playbooks, lands each row as review_status pending, and lets you approve or bulk-drop rejected rows grouped by reason — the platform answer to Track 2's 'in practice, hundreds more rows'.
-
4. Clean and prepare: the manifest is the source of truth
Stages 05–06: optional cleaning (PII findings, quality scores, chunks) and Prepare, which builds prepared/manifest.json plus train and eval splits carrying the task_profile and scoring_mode — replacing the by-hand tokenize/collate/split with a single declarative artifact everything downstream reads from.
-
5. Recipes and task handlers: the training config without the code
A recipe captures everything you set in TrainingArguments and LoraConfig — base model, LoRA knobs, learning rate, schedule, precision — as declarative config, while the task-handler dispatcher routes the manifest's task profile to the right handler that knows how to tokenize, mask, and score it.
-
6. Preflight and the trainability forecast
Stage 07 runs before you spend a GPU-hour: dependency, memory-fit, capability, and gate-policy checks that return pass/fail, a blocker list, and a generated train plan — the platform answer to Track 2's environment setup and the OOM you'd otherwise hit at step one.
-
7. Train: jobs, the bell, and the delta-from-baseline curve
Stage 08 runs your recipe as a watched background Job: the notification bell polls active jobs every few seconds, per-step loss and eval traces stream into a live curve plotted as delta from baseline, and named RunEvents capture completion or failure modes like training_oom and training_cancelled.
-
8. Evaluate: eval packs, gates, failure clusters & remediation
Stage 09 scores the trained model with an eval pack whose declared gates decide promotability, produces task-aware metrics from the handler's score(), groups the misses into failure clusters, and proposes remediation — the platform form of Track 2's evaluate-by-hand and Track 1's quality gate.
-
9. When fine-tuning isn't the answer: auto-RAG & reroute-to-RAG
Sometimes the eval says fine-tuning won't get there. BrewSLM's post-eval decision engine can recommend retrieval: auto-RAG builds a BM25 index at training completion and prepends top-K passages at inference, and reroute-to-RAG clones the project into a retrieval-first sibling that uses the base model plus retrieval, no LoRA.
-
10. Capstone B: export, deploy & Coach Mode end-to-end
Stages 10–11 close the lifecycle: export to GGUF, safetensors, HF, or vLLM with quantization variants, then deploy a versioned endpoint with smoke checks and scheduled drift detection — while Coach Mode threads the whole flow. The capstone re-runs the Track 2 sentiment task through BrewSLM and compares.
-
11. Training config reference (brewslm.yaml manifest)
Field-by-field reference for the brewslm.yaml manifest — api_version brewslm/v1, kind Project — and its ten spec sections: workflow, blueprint, domain, model, data_sources, adapters, training_plan, eval_pack, export, deployment. Every section's fields, what they control, and safe defaults.
-
12. RunEvent taxonomy & Coach Mode catalogue
The canonical RunEvent schema (nine stages, four severities), the lint-gated reason-code taxonomy by stage, plus the Coach Mode contract: five workflow stages, three severities, and the three action kinds (navigate / run_playbook / augment_from_cluster).
-
13. Eval pack & failure cluster reference
The Evaluation Contract v2 gate schema (gate_id, metric_id, operator, threshold, required) and how gates decide promotability. Plus the FailureCluster row keyed on (project_id, stage, reason_code, signature) and the augment_from_cluster remediation path.