Multi-task training and curriculum
After this lesson you can explain when multi-task training helps versus causes interference, how to balance tasks, what curriculum learning is, and how to test whether an ordering actually helped.
So far, one model, one task. But you might want a single small model that classifies and extracts and summarizes — fewer models to serve, shared understanding across tasks. That's multi-task training, and it introduces two new levers: balance and order.
Multi-task: one model, several skills
You train on a mix of tasks at once, each example tagged (often via the prompt/instruction) with what it's asking for. The upside is positive transfer — skills that share structure reinforce each other (sentiment and topic classification both teach the model to read reviews). The risk is task interference (negative transfer): unrelated or imbalanced tasks fight for capacity and all get worse.
Watch for interference
Multi-task isn't free. If adding task B drops task A's eval score, you have interference — too little shared structure, or B is drowning out A. The fix is usually balance (below) or, sometimes, just training separate models.
Balancing the mix
If task A has 50,000 rows and task B has 500, naive mixing means the model barely sees B. Balancing — up-weighting or up-sampling the smaller task, or capping the larger — keeps each task influential. This is the class-balance lesson from Track 1, lifted from classes to whole tasks.
Curriculum: order matters
Curriculum learning orders the training data, typically easy → hard, like a course syllabus. The intuition: early easy examples establish stable, general features; harder examples then refine them. It can speed convergence and improve the final model, especially when difficulty varies a lot across your data.
- Easy-to-hard — short/clear examples first, long/ambiguous later.
- Coarse-to-fine — general examples first, edge cases later.
- Skill stacking — prerequisite sub-skills before composite ones.
Don't guess — A/B it
Curriculum effects are real but not guaranteed; an ordering that helps one dataset does nothing for another. So treat it as a hypothesis and test it. BrewSLM has a curriculum A/B harness for exactly this — train with and without the ordering, evaluate both on the same gold set, and keep the winner:
$ python -m scripts.curriculum_ab --project 1
# trains baseline (shuffled) vs curriculum (easy->hard) on the same data,
# evaluates both on the gold set, and reports the metric delta
From Track 3
This is the same A/B discipline as the auto-RAG comparison: never adopt a technique on faith — run it against the alternative on a fixed gold set and let the delta decide. Curriculum and multi-task are hypotheses you test, not defaults you assume.
Key idea
Multi-task trains one model on several tasks — positive transfer if they share structure, interference if they don't; fix with balancing. Curriculum orders data easy→hard. Both are hypotheses: A/B them on a fixed gold set and keep the winner.
Key terms
- multi-task training
- Training one model on several tasks at once, tagged per example.
- positive transfer
- When tasks sharing structure improve each other during joint training.
- task interference
- Negative transfer — unrelated or imbalanced tasks degrading each other.
- balancing
- Up-/down-weighting tasks so a large one doesn't drown a small one.
- curriculum learning
- Ordering training data (often easy→hard) to improve convergence and final quality.
- curriculum A/B
- Training with vs without an ordering and comparing on a fixed gold set to decide.
Check yourself
Answers are saved to this browser.