FAQs for ML and LLM engineers

Practical questions before you fine-tune a small language model.

These are the questions engineers actually ask before they adapt a base model to custom data: dataset mapping, GPU needs, LoRA-friendly hardware, base-model choice, evaluation, deployment, and how the audit trail works.

Getting started

Setup & first run

What do I install?

Python 3.12 + Node 18+ + Git, plus a single consumer GPU (8 GB VRAM is enough for the LoRA flow on the 1–3B models the autopilot recommends out of the box). Clone the repo, run pip install -r backend/requirements.txt, run npm install in the frontend, start the two dev servers. The README has the canonical command list.

Can I run BrewSLM without a GPU?

Ingest, eval, export, and the wizard work fine CPU-only. Training is the only stage that effectively needs CUDA. The autopilot will refuse to start a real training run on CPU; you can override with the simulation backend (TRAINING_BACKEND=simulate) to walk the pipeline end-to-end without a real model run.

Do I need a teacher model?

Only for synthetic data generation and the optional LLM-assisted mapping mode. Both are opt-in. Point TEACHER_MODEL_API_URL at any OpenAI-compatible endpoint — local Ollama works, hosted gateways work, anything that speaks the chat-completions API.

Data & mapping

How the import pipeline behaves

The introspector picked the wrong mapper. How do I override?

Two paths. CLI: pass --mapper <id> + --map-json '{…}' instead of --auto. UI: in the wizard's Map step, the mapper dropdown lists every registered mapper alongside the proposal; pick a different one and edit the field-map JSON before previewing. The deterministic sniffer is one input to the proposal, not the final word.

I have a dataset shape the built-in mappers don't cover.

Write a plugin mapper. One Python module with register_dataset_mappers(register) exporting a callable that returns an object implementing the TargetMapper protocol. List your module under DATASET_MAPPER_PLUGIN_MODULES in settings. BrewSLM loads it at boot alongside the built-ins; one broken plugin doesn't block the rest.

Why does the wizard block me below 80% confidence?

Because silent auto-mapping is the single most damaging failure mode for a tool like this — a bad mapper picks the wrong fields, the trainer learns the wrong thing, and you discover the issue at eval time after burning a GPU-hour. The 80% threshold lets you opt in to bypass (click "I've reviewed the proposal" / pass --force), but the default is to make you look.

What happens to rejected rows?

They're grouped by stable reason code (missing_text, label_not_allowed, length_mismatch, etc.). The wizard renders the breakdown; you tick which categories to bulk-drop. Dropped categories still appear in the audit row's rejection_counts — the data accounting stays honest.

Training & eval

How the model side behaves

Which base models does BrewSLM support?

Anything HuggingFace can load via transformers in the size range BrewSLM targets — Llama 3, Qwen 2 / 2.5, Mistral, Phi, Gemma, etc. The autopilot picks from a curated shortlist by task; you can pin a specific model via the Training Config page.

Why are my NER F1 scores so low?

You probably evaluated with the wrong metric shape. Check output_schema.scoring_mode on the prepared manifest — for NER / PII / span extraction, it should be "span_set", not the classification default. Span-set scoring requires exact (type, start, end) matches; off-by-one boundaries count as miss + hallucination. Most pre-BrewSLM eval libraries do not score this way.

Can I set custom promotion gates?

Yes. The eval pack carries the gate policy. Required gates block; optional gates allow degradation within tolerance. See app.services.evaluation_service for the gate evaluation and app.models.experiment for what's stored.

How does RAG eval differ from QA eval?

The RAG handler reports answer EM/F1 plus faithfulness (fraction of prediction tokens grounded in the retrieved context) plus context recall (fraction of gold-answer tokens present in the context). The combination lets you tell retriever bugs from generation bugs without leaving the page.

Governance & ops

Audit, reproducibility, support

How do I prove what happened in a training run?

The run_events table is the audit spine. Every workflow stage emits an event with a stable reason code; the support bundle exports them as JSON. The training manifest captures every input that produced the run, so you can replay an experiment from a frozen artifact.

What about secrets — HF tokens, Kaggle creds, teacher API keys?

Stored under the Project → Secrets surface with encrypted-at-rest values. The audit log records that a secret was used, not the value. CLI / API flows read from env vars; the wizard surfaces an inline auth banner when a source needs credentials that aren't present.

Can I export the audit log?

Yes — the support bundle includes a run_events.json dump scoped to the project, plus the prepared manifest, the recent failure clusters, and any ingest reports. One zip, point it at the legal/compliance team.

How do I get help / report a bug?

Open an issue at github.com/TensorGreed/__SLM__/issues. Include the failing RunEvent's run_id + reason_code if you have them — the audit trail is the fastest debug input we can read.

Learn The Concept

The Academy lessons that answer the longer version

If one of these FAQ answers is a little too short, these lessons go deeper on the choices behind LoRA, GPU sizing, base-model selection, and evaluation.

Still curious

Read the source

The codebase isn't large. Every capability listed in these FAQs maps to one or two files you can read end-to-end in an afternoon.

$ git clone https://github.com/TensorGreed/__SLM__ brewslm

$ cd brewslm/backend/app/services/dataset_import

$ ls

introspector.py mappers/ plugin_loader.py sources/ service.py …