LoRA and memory fit
Understand why LoRA is usually the starting point for local fine-tuning and how to reason about VRAM.
FAQs for ML and LLM engineers
These are the questions engineers actually ask before they adapt a base model to custom data: dataset mapping, GPU needs, LoRA-friendly hardware, base-model choice, evaluation, deployment, and how the audit trail works.
Getting started
Python 3.12 + Node 18+ + Git, plus a single consumer GPU (8 GB VRAM is enough for
the LoRA flow on the 1–3B models the autopilot recommends out of the box). Clone
the repo, run pip install -r backend/requirements.txt, run
npm install in the frontend, start the two dev servers. The README
has the canonical command list.
Ingest, eval, export, and the wizard work fine CPU-only. Training is the only
stage that effectively needs CUDA. The autopilot will refuse to start a real
training run on CPU; you can override with the simulation backend
(TRAINING_BACKEND=simulate) to walk the pipeline end-to-end without a
real model run.
Only for synthetic data generation and the optional LLM-assisted mapping mode.
Both are opt-in. Point TEACHER_MODEL_API_URL at any OpenAI-compatible
endpoint — local Ollama works, hosted gateways work, anything that speaks the
chat-completions API.
Data & mapping
Two paths. CLI: pass --mapper <id> + --map-json '{…}'
instead of --auto. UI: in the wizard's Map step, the mapper dropdown
lists every registered mapper alongside the proposal; pick a different one and
edit the field-map JSON before previewing. The deterministic sniffer is one
input to the proposal, not the final word.
Write a plugin mapper. One Python module with
register_dataset_mappers(register) exporting a callable that returns
an object implementing the TargetMapper protocol. List your module
under DATASET_MAPPER_PLUGIN_MODULES in settings. BrewSLM loads it at
boot alongside the built-ins; one broken plugin doesn't block the rest.
Because silent auto-mapping is the single most damaging failure mode for a tool
like this — a bad mapper picks the wrong fields, the trainer learns the wrong
thing, and you discover the issue at eval time after burning a GPU-hour. The 80%
threshold lets you opt in to bypass (click "I've reviewed the proposal" / pass
--force), but the default is to make you look.
They're grouped by stable reason code (missing_text,
label_not_allowed, length_mismatch, etc.). The wizard
renders the breakdown; you tick which categories to bulk-drop. Dropped categories
still appear in the audit row's rejection_counts — the data
accounting stays honest.
Training & eval
Anything HuggingFace can load via transformers in the size range
BrewSLM targets — Llama 3, Qwen 2 / 2.5, Mistral, Phi, Gemma, etc. The autopilot
picks from a curated shortlist by task; you can pin a specific model via the
Training Config page.
You probably evaluated with the wrong metric shape. Check
output_schema.scoring_mode on the prepared manifest — for NER / PII /
span extraction, it should be "span_set", not the classification
default. Span-set scoring requires exact (type, start, end) matches;
off-by-one boundaries count as miss + hallucination. Most pre-BrewSLM eval
libraries do not score this way.
Yes. The eval pack carries the gate policy. Required gates block; optional gates
allow degradation within tolerance. See
app.services.evaluation_service for the gate evaluation and
app.models.experiment for what's stored.
The RAG handler reports answer EM/F1 plus faithfulness (fraction of prediction tokens grounded in the retrieved context) plus context recall (fraction of gold-answer tokens present in the context). The combination lets you tell retriever bugs from generation bugs without leaving the page.
Governance & ops
The run_events table is the audit spine. Every workflow stage emits
an event with a stable reason code; the support bundle exports them as JSON. The
training manifest captures every input that produced the run, so you can replay
an experiment from a frozen artifact.
Stored under the Project → Secrets surface with encrypted-at-rest values. The audit log records that a secret was used, not the value. CLI / API flows read from env vars; the wizard surfaces an inline auth banner when a source needs credentials that aren't present.
Yes — the support bundle includes a run_events.json dump scoped to
the project, plus the prepared manifest, the recent failure clusters, and any
ingest reports. One zip, point it at the legal/compliance team.
Open an issue at
github.com/TensorGreed/__SLM__/issues. Include the failing RunEvent's run_id + reason_code
if you have them — the audit trail is the fastest debug input we can read.
Learn The Concept
If one of these FAQ answers is a little too short, these lessons go deeper on the choices behind LoRA, GPU sizing, base-model selection, and evaluation.
Understand why LoRA is usually the starting point for local fine-tuning and how to reason about VRAM.
Learn how to pick a base model that matches your domain, budget, tokenizer family, and deployment constraints.
Know how to score a fine-tuned model honestly before you export it or promote it into production.
Still curious
The codebase isn't large. Every capability listed in these FAQs maps to one or two files you can read end-to-end in an afternoon.
$ git clone https://github.com/TensorGreed/__SLM__ brewslm
$ cd brewslm/backend/app/services/dataset_import
$ ls
introspector.py mappers/ plugin_loader.py sources/ service.py …