Classification
- Per-class precision / recall / F1 + confusion matrix.
- Macro + micro aggregates.
- Legacy
exact_match/f1aliases preserved for gate compat.
Capabilities for ML engineers
BrewSLM is built for engineers fine-tuning a base model on real data. Source connectors, task mappers, task-aware evaluation, and plugin contracts all exist so you can move from messy domain-specific data to a small model you trust.
Source connectors
| Source id | Locator format | What it does |
|---|---|---|
jsonl |
jsonl:/path/to/file.jsonl |
One JSON object per line. Unparseable lines surface as sentinel rows. |
csv |
csv:/path/to/file.csv |
First row = header. Every cell is a string; mappers handle coercion. |
hf |
hf:<id>[:split[:revision]] |
Wraps datasets.load_dataset with streaming=True. Auth
via HF_TOKEN for gated datasets.
|
kaggle |
kaggle:competition:<slug> |
Downloads + extracts via the Kaggle API. Cache under
~/.cache/brewslm/kaggle/. Multi-file disambig via
?file=….
|
| your plugin | yourthing:<ref> |
Register via register_source(id, factory) at module load. Same
contract as built-ins.
|
Target mappers
| Mapper id | Task profile | Use when the row carries… |
|---|---|---|
bio_to_spans |
structured_extraction · span_set |
BIO-tagged tokens + labels (NER / PII) |
label_to_classification |
classification |
{text, label} |
text_only |
language_modeling |
a single text column, no labels |
qa_pair_passthrough |
qa |
{question, answer} |
chat_messages_passthrough |
chat_sft |
a messages list of {role, content} dicts |
preference_pair |
dpo |
{prompt, chosen, rejected} (RLHF / DPO / ORPO) |
rag_passthrough |
rag_qa |
{question, context, answer} (grounded QA) |
kv_to_structured |
structured_extraction · field_match |
flat key-value extractions (invoices, forms, receipts) |
| your plugin | your choice |
Register via register_dataset_mappers(register); declare any task
profile that has a registered handler.
|
Task-aware eval
exact_match / f1 aliases preserved for gate compat.(type, start, end) match; off-by-one boundaries count as miss + hallucination.output_schema.scoring_mode: "span_set" on the manifest.chosen/rejected are absent.length_ratio reported alongside content metrics.subtask field.Reproducibility
Saved mappings
The wizard's Save this mapping button persists
(locator, mapper_id, field_map, drop_reasons) under a name. The
Saved mappings panel on the Data tab shows last-run row counts and offers
one-click Re-run.
name: weekly-pii-refresh
locator: kaggle:competition:pii-detection-…
mapper_id: bio_to_spans
last_run_at: 2026-05-13T10:00:00Z
last_run_accepted: 5,012
Audit log
Every import — fresh or re-run — emits a RunEvent with stage
ingestion and reason code dataset_import_run (or
dataset_import_failed on errors). Payload includes the
config_id link when the run came from a saved mapping.
RunEvent { stage: "ingestion",
reason_code: "dataset_import_run",
payload: { source_id: "kaggle",
mapper_id: "bio_to_spans",
accepted_count: 5012,
rejection_counts: {length_mismatch: 18},
config_id: 12 } }
Extensibility
Register a new connector via register_source(id, factory); addressed by
locator prefix.
# DATA_ADAPTER_PLUGIN_MODULES, KIND_HAS_MODULE_LOADER=true
Drop a module under DATASET_MAPPER_PLUGIN_MODULES; export
register_dataset_mappers(register). Built-in failure isolation: one
broken plugin doesn't block the rest of the load.
def register_dataset_mappers(register):
register("my_mapper", MyMapper)
When the deterministic sniffer can't reach 0.8 confidence, the introspector can consult your teacher model for a mapping suggestion. The LLM's proposal goes through the same confidence gate; hallucinated mapper ids are rejected.
DATASET_IMPORT_LLM_ASSIST_ENABLED=true \
python -m app.cli.dataset_import introspect --locator … --llm-assist
Learn The Concept
If you want the theory behind custom dataset mapping, base-model selection, and task-aware evaluation, these are the lessons that explain why the product is built this way.
Know which rows belong to classification, extraction, QA, chat, or preference tuning before you map them.
Use the smallest capable base model, then decide whether LoRA or full fine-tuning is the right adaptation path.
Metrics only help if they match the task shape and the held-out gold set you actually care about.
See the surface end-to-end
Every capability above is one file you can read. Start with
backend/app/services/dataset_import/ for the pipeline, or
backend/app/services/eval_task_handler_service.py for the task
handlers.
$ git clone https://github.com/TensorGreed/__SLM__ brewslm
$ cd brewslm/backend
$ ls app/services/dataset_import/
introspector.py mappers/ plugin_loader.py sources/ service.py …