BrewSLM Academy · free & open
Supervised Fine-Tuning, from zero to hero.
A technical course that starts at "what is a model?" and takes you all the way to shipping small language models — first by hand in PyTorch and Transformers, then with BrewSLM, then into the advanced techniques: distillation, preference tuning (DPO/ORPO), quantization, multi-task training, serving, and production monitoring. No marketing, just the mechanics, defined as we go.
Your browser has storage disabled (private mode?), so lesson progress won't be saved this session.
0 of 0 lessons · published lessons complete
Progress is stored only in this browser (localStorage) — no account, nothing sent to a server. Use Export to back it up or move it to another device. Finish every track to claim your completion certificate.
The absolute basics
What a model is, how it learns, tokens, the Transformer, and what a language model actually does.
- What is a model?
- How models learn: loss & gradient descent
- Neural networks in one page
- From text to numbers: tokens & embeddings
- Attention & the Transformer, gently
- How language models work: next-token prediction
- Pretraining vs fine-tuning vs prompting vs RAG
- LLMs vs SLMs: scale, cost, latency
- The mental model of an SLM project
- Base vs instruct models
- Picking a base model
- From n-grams to Transformers
- Architecture taxonomy
The why & what of fine-tuning
Datasets, chat templates, tokenization, the training loop, cross-entropy, LoRA, and evaluation.
- What is Supervised Fine-Tuning?
- Choosing the objective: SFT, DPO, ORPO, RLHF
- Anatomy of an SFT example: the loss mask
- Chat templates & special tokens
- Task shapes
- Tokenization in practice
- Data quality I: dedup, balance, splits
- Data quality II: gold sets
- The training loop, step by step
- Cross-entropy loss for token prediction
- Learning rate, schedules, warmup
- Batch size & gradient accumulation
- Full fine-tuning vs LoRA
- LoRA knobs: rank, alpha, QLoRA
- GPU memory math
- OOM and how to survive it
- Overfitting & reading a loss curve
- Evaluation that matches the task
- Decoding controls: temperature, top-p, stop tokens
- Dataset formats in the wild
- Continued pretraining
- Catastrophic forgetting
Fine-tune by hand in PyTorch
Load SmolLM2, build a dataset, write a minimal LoRA training loop, evaluate, and ship an artifact — runnable code, nothing hidden.
- Set up the environment
- Load a base model + tokenizer
- Build a tiny SFT dataset
- Tokenize & collate: model-ready batches
- A minimal LoRA fine-tune with the Trainer
- Run it: read the logs, loss, checkpoints
- Evaluate by hand: run the gold set
- Merge the adapter, run inference, ship
- Capstone A: fine-tune end-to-end by hand
- SFT with TRL's SFTTrainer (20 lines)
- QLoRA hands-on with bitsandbytes
- Real metrics with sklearn & HF evaluate
- Structured outputs with pydantic
- Multi-turn chat SFT
- Project gallery: 6 SLM use cases
- LLM-as-a-judge
- Public benchmarks & lm-eval-harness
- Experiment tracking with MLflow & W&B
The same pipeline, automated
Re-run your by-hand fine-tune through BrewSLM's eleven-stage lifecycle: import, recipes, the trainability forecast, training jobs, eval packs, auto-RAG, and deployment.
- From script to platform: the lifecycle
- Ingest & map with per-row accountability
- Synthetic data & the review queue
- Clean & prepare: the manifest
- Recipes & task handlers
- Preflight & the trainability forecast
- Train: jobs, the bell & the baseline curve
- Eval packs, gates & failure clusters
- When not to fine-tune: auto-RAG & reroute
- Capstone B: export, deploy & Coach Mode
- Reference: training config (brewslm.yaml)
- Reference: RunEvent & Coach Mode catalogue
- Reference: eval pack & failure cluster
Beyond a single run
Distillation, preference tuning (DPO/ORPO), quantization & compression, multi-task & curriculum, serving, and production observability.
- Beyond a single SFT run: the toolkit
- Distillation I: the teacher & capturing logits
- Distillation II: the KD loss
- Distillation III: quality retained
- Preference tuning: DPO & ORPO
- Quantization & compression
- Multi-task training & curriculum
- Serving & inference optimization
- Observability & drift in production
- Capstone C: choosing the technique & graduating
- Production feedback loop
- Tool-use / function-calling fine-tuning
- Structured pruning
- Speculative decoding
- Reasoning training