Track 3 · With BrewSLM

Fine-tuning with BrewSLM

This track shows how BrewSLM turns custom datasets and a chosen base model into a platform workflow: ingest, map, prepare, preflight, train, evaluate, export, and deploy with an audit trail at every stage.

Start the track → All tracks Glossary

Track overview · video

A walkthrough of the whole track. The lessons below go deeper.

1. From script to platform: the BrewSLM lifecycle
You fine-tuned a model by hand in Track 2. This track re-runs that exact pipeline through BrewSLM, mapping each by-hand step to a platform surface. It opens with the eleven-stage lifecycle and why every stage emits a RunEvent.
2. Ingest & map your data with per-row accountability
Stages 01–04 of the BrewSLM lifecycle: ingest from a source locator, introspect the data's shape into a ranked mapping proposal, dry-run the mapping to preview accepted and rejected rows, and commit — replacing the hand-built dataset list from Track 2 with an auditable import.
3. Synthetic data and the review queue
When you don't have enough rows, BrewSLM generates them with recipe-aware synthetic playbooks, lands each row as review_status pending, and lets you approve or bulk-drop rejected rows grouped by reason — the platform answer to Track 2's 'in practice, hundreds more rows'.
4. Clean and prepare: the manifest is the source of truth
Stages 05–06: optional cleaning (PII findings, quality scores, chunks) and Prepare, which builds prepared/manifest.json plus train and eval splits carrying the task_profile and scoring_mode — replacing the by-hand tokenize/collate/split with a single declarative artifact everything downstream reads from.
5. Recipes and task handlers: the training config without the code
A recipe captures everything you set in TrainingArguments and LoraConfig — base model, LoRA knobs, learning rate, schedule, precision — as declarative config, while the task-handler dispatcher routes the manifest's task profile to the right handler that knows how to tokenize, mask, and score it.
6. Preflight and the trainability forecast
Stage 07 runs before you spend a GPU-hour: dependency, memory-fit, capability, and gate-policy checks that return pass/fail, a blocker list, and a generated train plan — the platform answer to Track 2's environment setup and the OOM you'd otherwise hit at step one.
7. Train: jobs, the bell, and the delta-from-baseline curve
Stage 08 runs your recipe as a watched background Job: the notification bell polls active jobs every few seconds, per-step loss and eval traces stream into a live curve plotted as delta from baseline, and named RunEvents capture completion or failure modes like training_oom and training_cancelled.
8. Evaluate: eval packs, gates, failure clusters & remediation
Stage 09 scores the trained model with an eval pack whose declared gates decide promotability, produces task-aware metrics from the handler's score(), groups the misses into failure clusters, and proposes remediation — the platform form of Track 2's evaluate-by-hand and Track 1's quality gate.
9. When fine-tuning isn't the answer: auto-RAG & reroute-to-RAG
Sometimes the eval says fine-tuning won't get there. BrewSLM's post-eval decision engine can recommend retrieval: auto-RAG builds a BM25 index at training completion and prepends top-K passages at inference, and reroute-to-RAG clones the project into a retrieval-first sibling that uses the base model plus retrieval, no LoRA.
10. Capstone B: export, deploy & Coach Mode end-to-end
Stages 10–11 close the lifecycle: export to GGUF, safetensors, HF, or vLLM with quantization variants, then deploy a versioned endpoint with smoke checks and scheduled drift detection — while Coach Mode threads the whole flow. The capstone re-runs the Track 2 sentiment task through BrewSLM and compares.
11. Training config reference (brewslm.yaml manifest)
Field-by-field reference for the brewslm.yaml manifest — api_version brewslm/v1, kind Project — and its ten spec sections: workflow, blueprint, domain, model, data_sources, adapters, training_plan, eval_pack, export, deployment. Every section's fields, what they control, and safe defaults.
12. RunEvent taxonomy & Coach Mode catalogue
The canonical RunEvent schema (nine stages, four severities), the lint-gated reason-code taxonomy by stage, plus the Coach Mode contract: five workflow stages, three severities, and the three action kinds (navigate / run_playbook / augment_from_cluster).
13. Eval pack & failure cluster reference
The Evaluation Contract v2 gate schema (gate_id, metric_id, operator, threshold, required) and how gates decide promotability. Plus the FailureCluster row keyed on (project_id, stage, reason_code, signature) and the augment_from_cluster remediation path.