Track 1 · SFT fundamentals · Lesson 14

LoRA knobs: rank, alpha, dropout, target modules, QLoRA

After this lesson you can configure a LoRA fine-tune: choose the rank and alpha, pick target modules, set dropout, and decide when QLoRA is worth it.

Level: beginner Read time: ~9 min Prerequisites: Full fine-tuning vs LoRA

LoRA has a handful of settings. You can get a good result with defaults, but understanding each knob lets you trade capacity, regularization, and memory deliberately.

Rank (r): the adapter's capacity

The rank r is the inner dimension of LoRA's two small matrices — effectively how much "room" the adapter has to express a change. Small r (e.g. 8) is cheap and often plenty for a narrow task; larger r (16, 32, 64) adds capacity for harder or broader tasks at the cost of more adapter parameters and memory. More is not always better — too much rank on a small dataset just gives the model more room to overfit. A common starting point is r = 16.

Alpha: how strongly the adapter speaks

alpha scales the adapter's contribution before it's added to the frozen weights (the effective scale is roughly alpha / r). A frequent convention is to set alpha = 2 × r (e.g. r=16, alpha=32). If you change r, people often scale alpha with it to keep the effective strength similar. Treat the pair together rather than tuning each blindly.

Target modules: where to put adapters

You choose which weight matrices get an adapter — the target modules. The attention projections (the query/key/value/output matrices from Track 0) are the usual targets; adapting the feed-forward layers too increases capacity and cost. A minimal, common choice is the query and value projections; "all linear layers" is the more thorough, heavier option. Start minimal and expand only if quality demands it.

Dropout: light regularization

LoRA dropout randomly zeroes some of the adapter's activations during training, a mild regularizer that helps avoid overfitting on small datasets. A small value like 0.05 is typical; raise it slightly if you see overfitting, lower it toward 0 for larger datasets.

QLoRA: LoRA on a quantized base

QLoRA shrinks memory further by quantizing the frozen base model — storing its weights in 4 bits instead of 16 — while still training a normal LoRA adapter on top. Since the base is frozen anyway, quantizing it costs little quality but roughly quarters the memory the weights occupy, letting you fine-tune a model that wouldn't otherwise fit. The trade is slightly slower steps and a small quality risk. Reach for QLoRA when memory is the binding constraint.

Sensible starting point

r = 16, alpha = 32, dropout 0.05, target the attention projections, full-precision base if it fits (QLoRA if not). Change one thing at a time and measure on the gold set.

These knobs spend memory. To set them — and the batch size — without crashing, you need to estimate the GPU memory a run will use, which is the next lesson.

Key terms

LoRA rank (r)
The adapter's inner dimension / capacity; higher = more room (and more overfit risk). Common: 16.
LoRA alpha
Scales the adapter's contribution (~alpha/r); often set to 2×r.
Target modules
Which weight matrices get adapters (commonly the attention query/value projections).
LoRA dropout
Mild regularization zeroing some adapter activations during training (e.g. 0.05).
QLoRA
LoRA on a 4-bit quantized frozen base, to fit fine-tuning into far less memory.
Quantization
Storing weights in fewer bits (e.g. 4) to save memory.

Check yourself

Answers are saved to this browser.

Progress is stored locally in your browser.