Workflow for small language model fine-tuning

From base model to domain-specific small language model, stage by stage.

This is the product path from custom dataset ingestion to training, evaluation, export, and deployment. Every stage has explicit inputs, outputs, and an audit-log entry, so ML engineers can see exactly what happened to their base model and their data.

Lifecycle map

The pipeline, stage by stage

01

Ingest

Inputs: source locator (jsonl:/path, hf:id:split, kaggle:competition:slug, etc.)

Outputs: rows on disk in the project's data dir.

Contract: source connectors implement load() + describe(). Unparseable rows surface as sentinel rows, not silent drops.

RunEvent: none here directly — the introspector + run downstream emit one each.

02

Introspect

Inputs: ~20 sample rows from the source.

Outputs: ranked ShapeHypothesis list + a top ProposedMapping with confidence + rationale.

Contract: never silently auto-picks. Below 0.80 needs --force.

RunEvent: none — introspection is a read.

03

Map (dry-run)

Inputs: locator + mapper + field map + optional drop reasons.

Outputs: sample of accepted rows + full rejection breakdown grouped by reason.

Contract: per-row accountability. Every row is a TransformedRow or a RejectedRow with a stable reason code.

RunEvent: none — dry-run.

04

Map (commit)

Inputs: same as dry-run, after user confirms.

Outputs: rows appended to the project's synthetic JSONL.

Contract: bulk-drop only affects the sample rendered to the user; counts stay in the audit row.

RunEvent: ingestion / dataset_import_run on success, dataset_import_failed on error. Payload: source, mapper, accepted/rejected counts, written_path, config_id.

05

Clean (optional)

Inputs: the project's RawDocuments.

Outputs: cleaned text + chunks JSONL + per-document metadata (PII findings, quality score, text hash).

Contract: runs as a background task with task_id polling so the HTTP request never blocks past the dev-proxy timeout.

RunEvent: cleaning / info on success; cleaning_outlier_threshold_exceeded / cleaning_pii_block on the named failure modes.

06

Prepare

Inputs: synthetic + cleaned rows; the project's dataset-adapter preset.

Outputs: prepared/manifest.json + train.jsonl + eval.jsonl. Carries task_profile + output_schema.scoring_mode.

Contract: the manifest is the source of truth for everything downstream; nothing reads from disk paths directly.

RunEvent: dataset-prep summary.

07

Preflight

Inputs: prepared manifest + base model + runtime profile + hardware.

Outputs: pass/fail + blocker list + a generated train plan.

Contract: dependency, memory-fit, capability, and gate-policy checks. Strict mode blocks; fail-open mode emits warnings.

RunEvent: blockers surface as training / training_dispatch_error when fatal.

08

Train

Inputs: prepared manifest + chosen recipe + checkpoint cadence.

Outputs: checkpoints + per-step loss / eval traces + a final adapter weights blob.

Contract: task-handler dispatcher routes to ClassificationHandler, QAHandler, StructuredExtractionHandler (span_set or field_match), RAGHandler, AlignmentHandler, Seq2SeqHandler, VisionLanguageHandler, AudioTranscriptHandler, or SafetyHandler.

RunEvent: training / info on completion (payload: experiment_id, backend, final_train_loss). training_oom / training_runtime_error / training_timeout / training_cancelled on failure.

09

Evaluate

Inputs: trained model + held-out eval set + eval pack.

Outputs: pass rate + task-aware metrics (per-class P/R/F1, span-set, faithfulness, alignment margin, BLEU/ROUGE, WER, etc.).

Contract: the metric shape comes from the handler's score(); gates declared in the eval pack decide promotability.

RunEvent: eval / info, parented to exp-<id>. eval_runtime_error / eval_judge_unavailable / eval_dataset_missing on failure.

10

Export

Inputs: trained model + target format (GGUF / safetensors / HF / vLLM-compat).

Outputs: artifact bundle with weights, tokenizer, manifest, smoke-check trace.

Contract: quantization variants (Q4_K_M, AWQ, GPTQ) tracked so the Compression page knows what's available.

RunEvent: export / info on success (payload: format, output_path, file_size_bytes). export_run_failed / export_artifact_missing / export_quantization_failed on the named failures.

11

Deploy (optional)

Inputs: an export + a target (vLLM endpoint / local runner / cloud burst).

Outputs: a versioned DeploymentVersion with promote / reject / rollback / drift-check actions.

Contract: smoke check at promote time; drift check re-runs the gold set against the live endpoint on a schedule.

RunEvent: deployment / info per action; deployment_smoke_failed / deployment_drift_detected / deployment_promote_blocked on failure.

Audit spine

One stream powers everything observability-related

Producers

Each stage above writes one or more RunEvent rows via emit_event(). Reason codes come from a lint-gated taxonomy; severity is one of info / warning / error / critical.

Consumers

The Observability timeline. The failure-cluster surface. The support bundle. The audit explorer. The Lab Journal gamification layer.

Why this matters

Adding a new pipeline stage is just: pick a stage constant, add a reason code, emit an event. Every downstream view picks it up automatically.

Best-effort hooks: the audit emission is wrapped in try/except at every call site. A bug in observability cannot break the data write path.

Learn The Concept

The Academy lessons behind the workflow

BrewSLM automates the lifecycle, but the underlying ideas still matter when you choose a base model, build a gold set, or decide whether a training run actually improved the model.

Try the path end-to-end

One command per stage. No glue scripts.

# ingest + map via the generic pipeline

$ python -m app.cli.dataset_import run \

    --locator hf:imdb:train --project 1 --auto --limit 5000

 

# or, via the wizard:

# Pipeline → Data → "Import dataset (auto-mapping)"

 

# train via autopilot from the Training Config page

 

# or, scripted:

$ curl -X POST localhost:8000/api/projects/1/training/run \

    -H 'Content-Type: application/json' \

    -d '{"autopilot": true, "one_click": true}'