Capabilities for ML engineers

Turn custom datasets into domain-specific small language models.

BrewSLM is built for engineers fine-tuning a base model on real data. Source connectors, task mappers, task-aware evaluation, and plugin contracts all exist so you can move from messy domain-specific data to a small model you trust.

Source connectors

Four built-in, plus a plugin slot

Source id Locator format What it does
jsonl jsonl:/path/to/file.jsonl One JSON object per line. Unparseable lines surface as sentinel rows.
csv csv:/path/to/file.csv First row = header. Every cell is a string; mappers handle coercion.
hf hf:<id>[:split[:revision]] Wraps datasets.load_dataset with streaming=True. Auth via HF_TOKEN for gated datasets.
kaggle kaggle:competition:<slug> Downloads + extracts via the Kaggle API. Cache under ~/.cache/brewslm/kaggle/. Multi-file disambig via ?file=….
your plugin yourthing:<ref> Register via register_source(id, factory) at module load. Same contract as built-ins.

Target mappers

Eight built-in, plus a plugin slot

Mapper id Task profile Use when the row carries…
bio_to_spans structured_extraction · span_set BIO-tagged tokens + labels (NER / PII)
label_to_classification classification {text, label}
text_only language_modeling a single text column, no labels
qa_pair_passthrough qa {question, answer}
chat_messages_passthrough chat_sft a messages list of {role, content} dicts
preference_pair dpo {prompt, chosen, rejected} (RLHF / DPO / ORPO)
rag_passthrough rag_qa {question, context, answer} (grounded QA)
kv_to_structured structured_extraction · field_match flat key-value extractions (invoices, forms, receipts)
your plugin your choice Register via register_dataset_mappers(register); declare any task profile that has a registered handler.

Task-aware eval

Metrics that match the task, not the average of every task

Classification

  • Per-class precision / recall / F1 + confusion matrix.
  • Macro + micro aggregates.
  • Legacy exact_match / f1 aliases preserved for gate compat.

Span-set scoring (NER / PII)

  • Strict (type, start, end) match; off-by-one boundaries count as miss + hallucination.
  • Per-class P/R/F1 next to micro/macro.
  • Triggered by output_schema.scoring_mode: "span_set" on the manifest.

RAG

  • Answer EM/F1 + faithfulness (fraction of prediction grounded in context).
  • Context recall — tells retriever bugs from generation bugs.
  • Per-row badge in Predictions Preview.

DPO / ORPO alignment

  • Similarity-to-chosen vs similarity-to-rejected (token-level F1 proxy).
  • Mean alignment margin across rows.
  • Falls back to plain EM/F1 when chosen/rejected are absent.

Seq2Seq

  • BLEU + chrF for translation; ROUGE for summarization; both for paraphrase.
  • length_ratio reported alongside content metrics.
  • Sub-task pulled from the prepared manifest's subtask field.

Vision + audio

  • Image captioning / VQA → vision-language handler.
  • Speech-to-text → audio-transcript handler (WER / CER).
  • Multimodal media flag enforced at preflight.

Reproducibility

Save once, re-run forever, audit every time

Saved mappings

The wizard's Save this mapping button persists (locator, mapper_id, field_map, drop_reasons) under a name. The Saved mappings panel on the Data tab shows last-run row counts and offers one-click Re-run.

name: weekly-pii-refresh

locator: kaggle:competition:pii-detection-…

mapper_id: bio_to_spans

last_run_at: 2026-05-13T10:00:00Z

last_run_accepted: 5,012

Audit log

Every import — fresh or re-run — emits a RunEvent with stage ingestion and reason code dataset_import_run (or dataset_import_failed on errors). Payload includes the config_id link when the run came from a saved mapping.

RunEvent { stage: "ingestion",

  reason_code: "dataset_import_run",

  payload: { source_id: "kaggle",

    mapper_id: "bio_to_spans",

    accepted_count: 5012,

    rejection_counts: {length_mismatch: 18},

    config_id: 12 } }

Extensibility

Plugin contracts everywhere it matters

Source plugins

Register a new connector via register_source(id, factory); addressed by locator prefix.

# DATA_ADAPTER_PLUGIN_MODULES, KIND_HAS_MODULE_LOADER=true

Mapper plugins

Drop a module under DATASET_MAPPER_PLUGIN_MODULES; export register_dataset_mappers(register). Built-in failure isolation: one broken plugin doesn't block the rest of the load.

def register_dataset_mappers(register):

    register("my_mapper", MyMapper)

LLM-assist (opt-in)

When the deterministic sniffer can't reach 0.8 confidence, the introspector can consult your teacher model for a mapping suggestion. The LLM's proposal goes through the same confidence gate; hallucinated mapper ids are rejected.

DATASET_IMPORT_LLM_ASSIST_ENABLED=true \

python -m app.cli.dataset_import introspect --locator … --llm-assist

Learn The Concept

The Academy lessons behind these capabilities

If you want the theory behind custom dataset mapping, base-model selection, and task-aware evaluation, these are the lessons that explain why the product is built this way.

See the surface end-to-end

The fastest read is the source

Every capability above is one file you can read. Start with backend/app/services/dataset_import/ for the pipeline, or backend/app/services/eval_task_handler_service.py for the task handlers.

$ git clone https://github.com/TensorGreed/__SLM__ brewslm

$ cd brewslm/backend

$ ls app/services/dataset_import/

introspector.py mappers/ plugin_loader.py sources/ service.py …