What is a model?
After this lesson you can explain what a machine-learning model is, what its parameters are, and why training and inference are two different things — using a concrete example you could compute by hand.
People say "AI model" and "language model" constantly, usually as if a model were a mysterious brain. It isn't. A model is something much more boring and much more useful: a function with adjustable numbers inside it, where the numbers were chosen automatically to fit a pile of examples. If you understand that one sentence, everything else in this course is detail.
A model is a function with knobs
A function takes an input and returns an output. f(x) = x + 2 is a function:
give it 3, it returns 5. A model is a function too — but it has
extra numbers built into it that we get to choose. Those numbers are called
parameters (also called weights).
Here is the smallest useful model. Suppose you want to predict a house's price from its size. You guess the relationship is a straight line:
price = w × size + b
Here w and b are the parameters — the knobs. If w = 200
and b = 50,000, then a 1,000 sq ft house is predicted at
200 × 1000 + 50,000 = 250,000. Change the knobs and you get a different
prediction. This two-knob model is real machine learning; it's called linear regression.
A modern language model is the same idea with hundreds of millions of knobs
instead of two.
Key idea
The structure of the model (here, "a straight line") is chosen by a human. The values of the parameters are not typed in by a human — they are found automatically from data. That automatic search is what "machine learning" refers to.
Where the knobs come from: training
We don't know good values for w and b in advance. So we collect
examples — real houses where we know both the size and the actual price —
and let the computer find knob values that make the model's predictions close to the truth.
The recipe, which we'll make precise in the next lesson, is:
- Start with random knob values.
- Run some examples through the model and compare predictions to the real answers.
- Measure how wrong you are with a single number called the loss.
- Nudge every knob a little in the direction that makes the loss smaller.
- Repeat thousands of times.
That loop — predict, measure error, nudge the knobs — is training. The output of training is just a set of numbers: the learned parameters. A trained model is nothing more than its structure plus those numbers saved to a file.
Using the model: inference
Once training is done, you freeze the knobs. Now you can feed the model a brand-new house size it has never seen and read off a predicted price. Running a trained, frozen model on new inputs is called inference (or "prediction").
The distinction matters for everything that follows:
- Training changes the parameters. It needs labelled examples, lots of compute, and a GPU for anything large.
- Inference keeps the parameters fixed. It just runs the function forward — far cheaper.
Heads up
When people say a model "knows" something, they mean its frozen parameters encode a pattern that produces useful outputs. There is no lookup table and no reasoning engine hiding inside — just numbers and arithmetic. Useful arithmetic, but arithmetic.
From numbers to language
A language model is the same machine, scaled up and pointed at text. Two things change:
- The input and output are tokens, not house sizes. Text is first chopped into small chunks called tokens (we cover this in a later lesson). The model reads a sequence of tokens and outputs a probability for every possible next token.
- There are far more knobs. Instead of two parameters, a small language model like SmolLM2-135M has about 135 million. A large one can have hundreds of billions. The structure is more elaborate (the Transformer, also a later lesson), but it is still a function with parameters learned by the predict-measure-nudge loop above.
So "the model generates text" unpacks to: predict the next token, append it, feed the new sequence back in, predict the next token, and repeat. Every one of those predictions is the same frozen function doing arithmetic with its learned parameters.
So what is a small language model?
A small language model (SLM) is simply one with relatively few parameters — millions to a few billion, rather than the tens or hundreds of billions of a frontier LLM. Fewer parameters means it is cheaper to run, faster to respond, and can live on a single GPU or even a laptop. The trade-off is that out of the box it knows less.
That trade-off is the entire premise of this course. The rest of the Academy is about a specific, powerful move: take a small base model and fine-tune it on a modest amount of your own data so that, for your one task, it rivals a model hundreds of times its size — at a fraction of the cost and latency. To do that well, you first have to understand the predict-measure-nudge loop in detail. That's the next lesson.
Key terms
- Model
- A function with adjustable parameters that maps inputs to outputs.
- Parameter (weight)
- A number inside the model, set by training rather than by hand.
- Training
- The process of adjusting parameters from example data to reduce a loss.
- Loss
- A single number measuring how wrong the model's predictions are.
- Inference
- Running a trained, frozen model on new inputs to get outputs.
- Language model
- A model whose inputs/outputs are tokens; it predicts the next token.
- Small language model (SLM)
- A language model with relatively few parameters — cheaper and faster, the focus of this course.
Check yourself
Four questions. Answers are saved to this browser.