Open-source small language model fine-tuning platform
Fine-tune small language models from a base model on your own data.
BrewSLM helps machine learning and LLM engineers turn custom datasets into domain-specific small language models. Ingest JSONL, CSV, HuggingFace, or Kaggle data, map it to the right task shape, run local training, evaluate honestly, and export an artifact you can actually inspect.
Academy
Learn the concepts that power the product
BrewSLM promotes the workflow; the Academy is the learning surface. Use it to learn supervised fine-tuning, LoRA, base-model choice, custom dataset shapes, and evaluation before you run the platform on your own domain data.
Start with SFT
Learn what supervised fine-tuning changes in a model, when it works, and when you should use RAG or continued pretraining instead.
Pick the right base model
The fastest path to a strong domain-specific model is usually the right instruct or base checkpoint plus the smallest effective adaptation.
Map custom datasets correctly
Most training problems are really data-shape problems. Get the task shape, dataset format, and evaluation metric aligned before you spend a GPU-hour.
Follow the engineer roadmap
New to small language models, or upskilling from broader ML or LLM work? The roadmap page links the exact Academy lessons in the right order.
What actually shipped
The pipeline, broken into the parts that are interesting
01 Schema introspector
Sniff columns, propose a mapping
Reads ~20 sample rows from any source, classifies each column by content shape
(text-like, categorical, BIO tags, chat messages, etc.), ranks hypotheses with
confidence + rationale. Never silently auto-picks — the user confirms via
--auto above the threshold or --force under it.
You control: the threshold, the override, the field map.
02 Eight target mappers
Domain-agnostic, registry-driven
bio_to_spans, label_to_classification, qa_pair_passthrough,
chat_messages_passthrough, preference_pair,
rag_passthrough, kv_to_structured, text_only.
Each declares the task profile it feeds.
You control: custom mappers via plugin modules.
03 Bulk-drop UX
Group rejects by reason, drop in one click
Per-row accountability is the contract — every raw row becomes either a TransformedRow or a RejectedRow with a stable reason code. Counts stay in the audit log even when you bulk-drop a category from the import.
You control: which rejection categories to silence.
04 Task-aware eval
The metric matches the shape
NER / PII get span-set scoring (strict (type, start, end) matching).
RAG gets faithfulness + context-recall alongside EM/F1. DPO gets similarity-to-chosen
vs similarity-to-rejected. Classification gets per-class P/R/F1 + confusion matrix.
No more "your F1 is 4% because the reference is one word."
You control: the gate policy + the metric registry.
05 Saved mappings + audit log
Reproducible imports, no re-introspecting
Save a mapping after first import; re-run against a refreshed source with one click
(or one POST /configs/{id}/run). Every import — fresh or re-run — emits
a RunEvent with source, locator, mapper, row counts, and the config_id link.
You control: when to refresh, when to retire.
06 Local-first runtime
Your data stays on your machine
FastAPI backend + SQLite + a React workspace. Everything runs locally; the teacher model is optional (Ollama, OpenAI-compatible endpoint, anything that speaks the chat-completions API). Cloud burst is opt-in, not required.
You control: the inference backend, the storage path, the boundary.
Operator view
What you see during an actual session
Import run audit (excerpt)
POST /api/projects/1/dataset-import/introspect
→ proposal: bio_to_spans conf=0.95 needs_force=false
POST /api/projects/1/dataset-import/preview
→ accepted=4982 rejected=18 (length_mismatch=18)
POST /api/projects/1/dataset-import/run
→ written_path=…/projects/1/synthetic/synthetic.jsonl
RunEvent stage=ingestion reason=dataset_import_run
RunEvent stage=training summary="Training completed: exp-7"
RunEvent stage=eval payload.pass_rate=0.92
RunEvent stage=export payload.format=gguf-q4_k_m
What lands in your project
source: hf:ai4privacy/pii-masking-200k:train
mapper: bio_to_spans
target_task_profile: structured_extraction
scoring_mode: span_set
written_rows: 4982
audit_log_entries: 12 (RunEvent stream)
saved_mapping: pii-weekly-refresh
lab_journal: +120 XP · L2 reached
| Source | Locator | Cache |
|---|---|---|
| JSONL / CSV | jsonl:/path |
local fs |
| HuggingFace | hf:org/dataset:split |
~/.cache/huggingface |
| Kaggle | kaggle:competition:slug |
~/.cache/brewslm/kaggle |
Why this matters
What you stop doing once BrewSLM is in your loop
Stop writing per-dataset converters
The Kaggle PII converter was 175 lines of careful BIO-to-spans logic. The introspector now infers the same mapping from sample rows — and it works on any BIO-tagged dataset, not just one.
Stop guessing which metric to trust
Task-aware eval handlers route to the metric shape that fits. NER doesn't get scored with classification F1 anymore. RAG faithfulness shows up next to answer F1, so you can tell retriever bugs from generation bugs.
Stop re-introspecting weekly
Save the mapping the first time. The Saved mappings panel shows last-run row counts; one-click re-runs hit the same RunEvent audit stream as a fresh import.
Stop losing rows silently
Per-row rejections carry stable reason codes. Bulk-drop happens only on categories you explicitly select; counts stay in the result regardless.
Architecture
Three planes, all running on your machine
Sources (jsonl / csv / hf / kaggle / plugin) → Introspector → Target mapper → Project's synthetic dataset (JSONL on disk).
Prepared manifest → Task-handler dispatcher (classification / qa / span_set / rag / dpo / chat_sft / language_modeling / structured_extraction) → Local trainer or optional cloud-burst backend.
Export targets (GGUF / safetensors / HF-compatible / vLLM) → Deployment versions with promote / rollback / drift checks.
Audit spine: every action above emits a RunEvent with stage, reason code, and structured payload. Same stream feeds the Observability page, the Lab Journal progression layer, and the support bundle.
Common first questions
Quick answers; the long version lives on the FAQs page
Is BrewSLM open source?
Yes. Source lives at github.com/TensorGreed/__SLM__. Run it locally; no account or hosted service required.
Do I need a GPU?
For training, yes — a single consumer GPU is plenty for the SLM size range BrewSLM targets (1B–10B parameters with LoRA). For ingest / eval / export, no.
Where does my data go?
Into the project's local directory on your machine. SQLite metadata, JSONL row storage, file artifacts. The optional teacher model (for synthetic generation) is the only external call, and you point it at whatever endpoint you want — local Ollama works fine.
Can I plug in a custom mapper for my weird dataset?
Yes. Drop a Python module under DATASET_MAPPER_PLUGIN_MODULES; it
registers via register_dataset_mappers(register) alongside the built-ins.
Same pattern as data-adapter and training-runtime plugins.
Quickstart
Clone, install, import, train
Five blocks. The first three boot the platform; the last two are the demo run-through. Match this against the README for the canonical version.
# 1. Clone
$ git clone https://github.com/TensorGreed/__SLM__ brewslm && cd brewslm
# 2. Backend
$ cd backend && python -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt
$ uvicorn app.main:app --reload --port 8000
# 3. Frontend (new shell)
$ cd frontend && npm install && npm run dev
# 4. Open http://localhost:5173 and create a project
# 5. Import via the wizard, or from the CLI:
$ cd backend && python -m app.cli.dataset_import run \
--locator hf:imdb:train --project 1 --auto --limit 5000