Track 1 · SFT fundamentals · Lesson 5

Task shapes: classification, QA, extraction, summarization, chat

After this lesson you can identify the common SFT task shapes, describe how each frames its data and completion, and pick an evaluation metric that actually fits the shape.

Level: beginner Read time: ~9 min Prerequisites: Chat templates & special tokens

Before you collect a single example, decide your task shape. The shape determines three things at once: how the data is formatted, what the completion looks like, and which metric tells you whether the model is any good. BrewSLM organizes its whole pipeline around shapes for exactly this reason.

The common shapes

(There's also the preference shape — (prompt, chosen, rejected) — used by DPO, which we covered as an objective and revisit in Track 4.)

The shape determines the completion

Notice how the completion differs: a single word for classification, a structured JSON for extraction, a paragraph for summarization. That shape flows straight into the loss mask (what you train to produce) and the chat template (how it's framed). Picking the shape is the first design decision of a fine-tuning project, not an afterthought.

Key idea

The metric must match the shape. Scoring NER with classification F1, or a generated summary with exact-match, produces numbers that look precise and mean nothing. "Your F1 is 4% because the reference was one word" is a measurement bug, not a model failure.

Choosing your shape

Map your real problem onto the closest shape, and if it doesn't fit cleanly, reframe it until it does — a well-chosen shape makes data collection, training, and evaluation all straightforward, while a forced one fights you at every stage. Most business tasks reduce to classification, extraction, or QA. Once the shape is fixed, the next thing that determines success is the quality of the data itself — the subject of the next two lessons.

Key terms

Task shape
The structural form of a task (classification, QA, extraction, summarization, chat) that sets the data format, loss, and metric.
Classification
Input → one of N fixed labels.
Extraction / NER
Input → a set of typed spans; scored by span-set matching.
Structured output
Input → a JSON object with specific fields; scored by valid-JSON rate + field correctness.
Summarization
Long input → short faithful summary; scored by overlap (ROUGE) + faithfulness.
Metric-shape fit
Choosing an evaluation metric that matches the task shape so the numbers are meaningful.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.