LoRA knobs: rank, alpha, dropout, target modules, QLoRA
After this lesson you can configure a LoRA fine-tune: choose the rank and alpha, pick target modules, set dropout, and decide when QLoRA is worth it.
LoRA has a handful of settings. You can get a good result with defaults, but understanding each knob lets you trade capacity, regularization, and memory deliberately.
Rank (r): the adapter's capacity
The rank r is the inner dimension of LoRA's two small matrices — effectively how much "room" the adapter has to express a change. Small r (e.g. 8) is cheap and often plenty for a narrow task; larger r (16, 32, 64) adds capacity for harder or broader tasks at the cost of more adapter parameters and memory. More is not always better — too much rank on a small dataset just gives the model more room to overfit. A common starting point is r = 16.
Alpha: how strongly the adapter speaks
alpha scales the adapter's contribution before it's added to the frozen weights (the effective scale is roughly alpha / r). A frequent convention is to set alpha = 2 × r (e.g. r=16, alpha=32). If you change r, people often scale alpha with it to keep the effective strength similar. Treat the pair together rather than tuning each blindly.
Target modules: where to put adapters
You choose which weight matrices get an adapter — the target modules. The attention projections (the query/key/value/output matrices from Track 0) are the usual targets; adapting the feed-forward layers too increases capacity and cost. A minimal, common choice is the query and value projections; "all linear layers" is the more thorough, heavier option. Start minimal and expand only if quality demands it.
Dropout: light regularization
LoRA dropout randomly zeroes some of the adapter's activations during training, a mild regularizer that helps avoid overfitting on small datasets. A small value like 0.05 is typical; raise it slightly if you see overfitting, lower it toward 0 for larger datasets.
QLoRA: LoRA on a quantized base
QLoRA shrinks memory further by quantizing the frozen base model — storing its weights in 4 bits instead of 16 — while still training a normal LoRA adapter on top. Since the base is frozen anyway, quantizing it costs little quality but roughly quarters the memory the weights occupy, letting you fine-tune a model that wouldn't otherwise fit. The trade is slightly slower steps and a small quality risk. Reach for QLoRA when memory is the binding constraint.
Sensible starting point
r = 16, alpha = 32, dropout 0.05, target the attention projections, full-precision base if it fits (QLoRA if not). Change one thing at a time and measure on the gold set.
These knobs spend memory. To set them — and the batch size — without crashing, you need to estimate the GPU memory a run will use, which is the next lesson.
Key terms
- LoRA rank (r)
- The adapter's inner dimension / capacity; higher = more room (and more overfit risk). Common: 16.
- LoRA alpha
- Scales the adapter's contribution (~alpha/r); often set to 2×r.
- Target modules
- Which weight matrices get adapters (commonly the attention query/value projections).
- LoRA dropout
- Mild regularization zeroing some adapter activations during training (e.g. 0.05).
- QLoRA
- LoRA on a 4-bit quantized frozen base, to fit fine-tuning into far less memory.
- Quantization
- Storing weights in fewer bits (e.g. 4) to save memory.
Check yourself
Answers are saved to this browser.