Open-source small language model fine-tuning platform

Fine-tune small language models from a base model on your own data.

BrewSLM helps machine learning and LLM engineers turn custom datasets into domain-specific small language models. Ingest JSONL, CSV, HuggingFace, or Kaggle data, map it to the right task shape, run local training, evaluate honestly, and export an artifact you can actually inspect.

4 data sources JSONL, CSV, HuggingFace, Kaggle
8 task mappers classification, QA, chat, DPO, RAG, NER, k-v, LM
local + auditable run locally, inspect every step

Academy

Learn the concepts that power the product

BrewSLM promotes the workflow; the Academy is the learning surface. Use it to learn supervised fine-tuning, LoRA, base-model choice, custom dataset shapes, and evaluation before you run the platform on your own domain data.

What actually shipped

The pipeline, broken into the parts that are interesting

01 Schema introspector

Sniff columns, propose a mapping

Reads ~20 sample rows from any source, classifies each column by content shape (text-like, categorical, BIO tags, chat messages, etc.), ranks hypotheses with confidence + rationale. Never silently auto-picks — the user confirms via --auto above the threshold or --force under it.

You control: the threshold, the override, the field map.

02 Eight target mappers

Domain-agnostic, registry-driven

bio_to_spans, label_to_classification, qa_pair_passthrough, chat_messages_passthrough, preference_pair, rag_passthrough, kv_to_structured, text_only. Each declares the task profile it feeds.

You control: custom mappers via plugin modules.

03 Bulk-drop UX

Group rejects by reason, drop in one click

Per-row accountability is the contract — every raw row becomes either a TransformedRow or a RejectedRow with a stable reason code. Counts stay in the audit log even when you bulk-drop a category from the import.

You control: which rejection categories to silence.

04 Task-aware eval

The metric matches the shape

NER / PII get span-set scoring (strict (type, start, end) matching). RAG gets faithfulness + context-recall alongside EM/F1. DPO gets similarity-to-chosen vs similarity-to-rejected. Classification gets per-class P/R/F1 + confusion matrix. No more "your F1 is 4% because the reference is one word."

You control: the gate policy + the metric registry.

05 Saved mappings + audit log

Reproducible imports, no re-introspecting

Save a mapping after first import; re-run against a refreshed source with one click (or one POST /configs/{id}/run). Every import — fresh or re-run — emits a RunEvent with source, locator, mapper, row counts, and the config_id link.

You control: when to refresh, when to retire.

06 Local-first runtime

Your data stays on your machine

FastAPI backend + SQLite + a React workspace. Everything runs locally; the teacher model is optional (Ollama, OpenAI-compatible endpoint, anything that speaks the chat-completions API). Cloud burst is opt-in, not required.

You control: the inference backend, the storage path, the boundary.

Operator view

What you see during an actual session

Import run audit (excerpt)

POST /api/projects/1/dataset-import/introspect

→ proposal: bio_to_spans conf=0.95 needs_force=false

POST /api/projects/1/dataset-import/preview

→ accepted=4982 rejected=18 (length_mismatch=18)

POST /api/projects/1/dataset-import/run

→ written_path=…/projects/1/synthetic/synthetic.jsonl

RunEvent stage=ingestion reason=dataset_import_run

RunEvent stage=training summary="Training completed: exp-7"

RunEvent stage=eval payload.pass_rate=0.92

RunEvent stage=export payload.format=gguf-q4_k_m

What lands in your project

source: hf:ai4privacy/pii-masking-200k:train

mapper: bio_to_spans

target_task_profile: structured_extraction

scoring_mode: span_set

written_rows: 4982

audit_log_entries: 12 (RunEvent stream)

saved_mapping: pii-weekly-refresh

lab_journal: +120 XP · L2 reached

Source Locator Cache
JSONL / CSV jsonl:/path local fs
HuggingFace hf:org/dataset:split ~/.cache/huggingface
Kaggle kaggle:competition:slug ~/.cache/brewslm/kaggle

Why this matters

What you stop doing once BrewSLM is in your loop

Stop writing per-dataset converters

The Kaggle PII converter was 175 lines of careful BIO-to-spans logic. The introspector now infers the same mapping from sample rows — and it works on any BIO-tagged dataset, not just one.

Stop guessing which metric to trust

Task-aware eval handlers route to the metric shape that fits. NER doesn't get scored with classification F1 anymore. RAG faithfulness shows up next to answer F1, so you can tell retriever bugs from generation bugs.

Stop re-introspecting weekly

Save the mapping the first time. The Saved mappings panel shows last-run row counts; one-click re-runs hit the same RunEvent audit stream as a fresh import.

Stop losing rows silently

Per-row rejections carry stable reason codes. Bulk-drop happens only on categories you explicitly select; counts stay in the result regardless.

Architecture

Three planes, all running on your machine

Data plane

Sources (jsonl / csv / hf / kaggle / plugin) → Introspector → Target mapper → Project's synthetic dataset (JSONL on disk).

Training plane

Prepared manifest → Task-handler dispatcher (classification / qa / span_set / rag / dpo / chat_sft / language_modeling / structured_extraction) → Local trainer or optional cloud-burst backend.

Delivery plane

Export targets (GGUF / safetensors / HF-compatible / vLLM) → Deployment versions with promote / rollback / drift checks.

Audit spine: every action above emits a RunEvent with stage, reason code, and structured payload. Same stream feeds the Observability page, the Lab Journal progression layer, and the support bundle.

Common first questions

Quick answers; the long version lives on the FAQs page

Is BrewSLM open source?

Yes. Source lives at github.com/TensorGreed/__SLM__. Run it locally; no account or hosted service required.

Do I need a GPU?

For training, yes — a single consumer GPU is plenty for the SLM size range BrewSLM targets (1B–10B parameters with LoRA). For ingest / eval / export, no.

Where does my data go?

Into the project's local directory on your machine. SQLite metadata, JSONL row storage, file artifacts. The optional teacher model (for synthetic generation) is the only external call, and you point it at whatever endpoint you want — local Ollama works fine.

Can I plug in a custom mapper for my weird dataset?

Yes. Drop a Python module under DATASET_MAPPER_PLUGIN_MODULES; it registers via register_dataset_mappers(register) alongside the built-ins. Same pattern as data-adapter and training-runtime plugins.

Quickstart

Clone, install, import, train

Five blocks. The first three boot the platform; the last two are the demo run-through. Match this against the README for the canonical version.

# 1. Clone

$ git clone https://github.com/TensorGreed/__SLM__ brewslm && cd brewslm

 

# 2. Backend

$ cd backend && python -m venv .venv && source .venv/bin/activate

$ pip install -r requirements.txt

$ uvicorn app.main:app --reload --port 8000

 

# 3. Frontend (new shell)

$ cd frontend && npm install && npm run dev

 

# 4. Open http://localhost:5173 and create a project

 

# 5. Import via the wizard, or from the CLI:

$ cd backend && python -m app.cli.dataset_import run \

    --locator hf:imdb:train --project 1 --auto --limit 5000