Track 3 · With BrewSLM

With BrewSLM

Re-run the by-hand pipeline through BrewSLM's eleven-stage lifecycle, mapping each step to its platform surface — import, recipes, the trainability forecast, training jobs, eval packs, auto-RAG, and deployment.

  1. 1. From script to platform: the BrewSLM lifecycle

    You fine-tuned a model by hand in Track 2. This track re-runs that exact pipeline through BrewSLM, mapping each by-hand step to a platform surface. It opens with the eleven-stage lifecycle and why every stage emits a RunEvent.

  2. 2. Ingest & map your data with per-row accountability

    Stages 01–04 of the BrewSLM lifecycle: ingest from a source locator, introspect the data's shape into a ranked mapping proposal, dry-run the mapping to preview accepted and rejected rows, and commit — replacing the hand-built dataset list from Track 2 with an auditable import.

  3. 3. Synthetic data and the review queue

    When you don't have enough rows, BrewSLM generates them with recipe-aware synthetic playbooks, lands each row as review_status pending, and lets you approve or bulk-drop rejected rows grouped by reason — the platform answer to Track 2's 'in practice, hundreds more rows'.

  4. 4. Clean and prepare: the manifest is the source of truth

    Stages 05–06: optional cleaning (PII findings, quality scores, chunks) and Prepare, which builds prepared/manifest.json plus train and eval splits carrying the task_profile and scoring_mode — replacing the by-hand tokenize/collate/split with a single declarative artifact everything downstream reads from.

  5. 5. Recipes and task handlers: the training config without the code

    A recipe captures everything you set in TrainingArguments and LoraConfig — base model, LoRA knobs, learning rate, schedule, precision — as declarative config, while the task-handler dispatcher routes the manifest's task profile to the right handler that knows how to tokenize, mask, and score it.

  6. 6. Preflight and the trainability forecast

    Stage 07 runs before you spend a GPU-hour: dependency, memory-fit, capability, and gate-policy checks that return pass/fail, a blocker list, and a generated train plan — the platform answer to Track 2's environment setup and the OOM you'd otherwise hit at step one.

  7. 7. Train: jobs, the bell, and the delta-from-baseline curve

    Stage 08 runs your recipe as a watched background Job: the notification bell polls active jobs every few seconds, per-step loss and eval traces stream into a live curve plotted as delta from baseline, and named RunEvents capture completion or failure modes like training_oom and training_cancelled.

  8. 8. Evaluate: eval packs, gates, failure clusters & remediation

    Stage 09 scores the trained model with an eval pack whose declared gates decide promotability, produces task-aware metrics from the handler's score(), groups the misses into failure clusters, and proposes remediation — the platform form of Track 2's evaluate-by-hand and Track 1's quality gate.

  9. 9. When fine-tuning isn't the answer: auto-RAG & reroute-to-RAG

    Sometimes the eval says fine-tuning won't get there. BrewSLM's post-eval decision engine can recommend retrieval: auto-RAG builds a BM25 index at training completion and prepends top-K passages at inference, and reroute-to-RAG clones the project into a retrieval-first sibling that uses the base model plus retrieval, no LoRA.

  10. 10. Capstone B: export, deploy & Coach Mode end-to-end

    Stages 10–11 close the lifecycle: export to GGUF, safetensors, HF, or vLLM with quantization variants, then deploy a versioned endpoint with smoke checks and scheduled drift detection — while Coach Mode threads the whole flow. The capstone re-runs the Track 2 sentiment task through BrewSLM and compares.

  11. 11. Training config reference (brewslm.yaml manifest)

    Field-by-field reference for the brewslm.yaml manifest — api_version brewslm/v1, kind Project — and its ten spec sections: workflow, blueprint, domain, model, data_sources, adapters, training_plan, eval_pack, export, deployment. Every section's fields, what they control, and safe defaults.

  12. 12. RunEvent taxonomy & Coach Mode catalogue

    The canonical RunEvent schema (nine stages, four severities), the lint-gated reason-code taxonomy by stage, plus the Coach Mode contract: five workflow stages, three severities, and the three action kinds (navigate / run_playbook / augment_from_cluster).

  13. 13. Eval pack & failure cluster reference

    The Evaluation Contract v2 gate schema (gate_id, metric_id, operator, threshold, required) and how gates decide promotability. Plus the FailureCluster row keyed on (project_id, stage, reason_code, signature) and the augment_from_cluster remediation path.