Track 0 · Foundations · Lesson 9

The mental model of an SLM project

After this lesson you can describe the end-to-end loop of a fine-tuning project, name what each stage produces, explain why the gold set is your single source of truth, and see how the rest of the Academy maps onto this loop.

Level: beginner Read time: ~8 min Prerequisites: LLMs vs SLMs

You now understand the parts. This lesson assembles them into the process you'll repeat for the rest of the course. Training a model is not a one-shot event; it's a loop you go around several times, and most of the work is not in the training step at all — it's in the data and the evaluation around it.

The loop

A fine-tuning project has six stages, and the last one usually sends you back to the start:

  1. Data. Gather and prepare examples of your task: inputs paired with the outputs you want. Clean them, format them, split them. This is where most of your time goes and where most of your quality comes from.
  2. Train. Run supervised fine-tuning (usually with LoRA) on a base model. This is the gradient-descent loop from Lesson 2, now over a Transformer.
  3. Evaluate. Measure the trained model against held-out examples it never trained on, using a metric that fits the task. This tells you whether you actually improved.
  4. Iterate. Look at where it fails, fix the data (add examples for weak cases, remove bad ones, rebalance), and retrain. Repeat until the metric clears your bar.
  5. Export & deploy. Package the model into a servable artifact and stand it up behind an endpoint, with a version you can roll back.
  6. Monitor. Watch real-world inputs for drift — the day the live data stops resembling your training data, quality slips and you loop back to step 1.
Data Train Evaluate Ship Monitor iterate: fix the data, retrain
The dashed arrow is the real job. You rarely train once; you train, look at failures, improve the data, and train again.

Your north star: the gold set

The most important artifact in the whole loop is the gold set — a curated batch of examples, with known-correct answers, that the model never trains on. It exists solely to measure quality. Every decision ("is this version better?", "did this data change help?", "is it good enough to ship?") is answered against the gold set. If you train on your evaluation data, your numbers become fiction — the model can memorize the test. Keeping the gold set separate and trustworthy is the discipline that makes all your metrics meaningful.

Key idea

Modern model-building is data-centric: you usually improve results far more by fixing data than by fiddling with the model. The loop's center of gravity is "evaluate → understand failures → improve data," not "tweak hyperparameters."

What "good" looks like at each stage

How the rest of the Academy maps to this loop

Everything ahead is this loop, at increasing depth:

That completes the foundations. You can now talk about what a model is, how it learns, how language models produce text, the levers for steering them, and the shape of a real project. Next, Track 1 makes Supervised Fine-Tuning precise — starting with what SFT actually is, and when not to use it.

Key terms

Project lifecycle
The loop: data → train → evaluate → iterate → export/deploy → monitor.
Gold set
Curated, known-correct examples the model never trains on; the source of truth for quality.
Train/validation/test split
Partitioning data so you train on one part and measure on data the model hasn't seen.
Data-centric iteration
Improving results primarily by fixing data, guided by failure analysis.
Deployment
Packaging and serving the model as a versioned, rollback-able artifact.
Drift
When live inputs stop resembling training data, degrading quality over time.

Check yourself

Four questions. Answers are saved to this browser.

Progress is stored locally in your browser.