Track 4 · Advanced · Lesson 7

Multi-task training and curriculum

After this lesson you can explain when multi-task training helps versus causes interference, how to balance tasks, what curriculum learning is, and how to test whether an ordering actually helped.

Level: advanced Read time: ~10 min Prerequisites: Quantization & compression: Q4_K_M, AWQ, GPTQ, GGUF, ONNX

So far, one model, one task. But you might want a single small model that classifies and extracts and summarizes — fewer models to serve, shared understanding across tasks. That's multi-task training, and it introduces two new levers: balance and order.

Multi-task: one model, several skills

You train on a mix of tasks at once, each example tagged (often via the prompt/instruction) with what it's asking for. The upside is positive transfer — skills that share structure reinforce each other (sentiment and topic classification both teach the model to read reviews). The risk is task interference (negative transfer): unrelated or imbalanced tasks fight for capacity and all get worse.

Watch for interference

Multi-task isn't free. If adding task B drops task A's eval score, you have interference — too little shared structure, or B is drowning out A. The fix is usually balance (below) or, sometimes, just training separate models.

Balancing the mix

If task A has 50,000 rows and task B has 500, naive mixing means the model barely sees B. Balancing — up-weighting or up-sampling the smaller task, or capping the larger — keeps each task influential. This is the class-balance lesson from Track 1, lifted from classes to whole tasks.

Curriculum: order matters

Curriculum learning orders the training data, typically easy → hard, like a course syllabus. The intuition: early easy examples establish stable, general features; harder examples then refine them. It can speed convergence and improve the final model, especially when difficulty varies a lot across your data.

Don't guess — A/B it

Curriculum effects are real but not guaranteed; an ordering that helps one dataset does nothing for another. So treat it as a hypothesis and test it. BrewSLM has a curriculum A/B harness for exactly this — train with and without the ordering, evaluate both on the same gold set, and keep the winner:

$ python -m scripts.curriculum_ab --project 1
# trains baseline (shuffled) vs curriculum (easy->hard) on the same data,
# evaluates both on the gold set, and reports the metric delta

From Track 3

This is the same A/B discipline as the auto-RAG comparison: never adopt a technique on faith — run it against the alternative on a fixed gold set and let the delta decide. Curriculum and multi-task are hypotheses you test, not defaults you assume.

Key idea

Multi-task trains one model on several tasks — positive transfer if they share structure, interference if they don't; fix with balancing. Curriculum orders data easy→hard. Both are hypotheses: A/B them on a fixed gold set and keep the winner.

Key terms

multi-task training
Training one model on several tasks at once, tagged per example.
positive transfer
When tasks sharing structure improve each other during joint training.
task interference
Negative transfer — unrelated or imbalanced tasks degrading each other.
balancing
Up-/down-weighting tasks so a large one doesn't drown a small one.
curriculum learning
Ordering training data (often easy→hard) to improve convergence and final quality.
curriculum A/B
Training with vs without an ordering and comparing on a fixed gold set to decide.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.