Tutorial · Deep reference · Domain packs

Building domain packs — the finance / AP pack worked example

By the end you'll have a reusable finance-pack-v1 registered in BrewSLM, assigned to a working invoice-extraction project, and ready to apply to sibling AP projects (POs, statements, expense reports). The pack carries the conventions your team has converged on — currency normalisation across locales, a weighted-score evaluator hook reflecting which fields drive the SOX key controls, a starter glossary, and tighter registry gates than the platform default. (Per-entity precision floors as gating still live in the project's eval pack.) Where T8 introduced packs lightly, this tutorial is the canonical reference: every contract field, every overlay slot, every assignment edge case.

Level: intermediate → advanced Time: ~3 hours total (most of it design discussion + JSON authoring) Prerequisites: Tutorial 0 (Setup BrewSLM). Strongly recommended companions: Tutorial 3 (invoice extraction) — the project this pack augments — and Tutorial 8 (legal contracts domain pack) — the lighter pack walkthrough, this tutorial goes deeper.

Before you start

This tutorial assumes BrewSLM is running locally at http://localhost:5173 with an admin user signed in. If you haven't done that yet, complete Tutorial 0 — Set up BrewSLM and your first project first.

You'll get the most out of this tutorial if you already have a finished invoice-extraction project sitting in your workspace — the gold set, the cleaned data, the eval thresholds you converged on. The finance pack we'll build is the act of packaging what's already working into something the rest of your AP team can apply to their projects. Tutorial 3 walks the invoice extractor end to end.

If you've read T8, you've already seen the pack contract once. This tutorial does not re-walk the rag-protocol recipe; head back to T1 or T8 for that. The focus here is on the pack layer itself, with finance / AP as the worked example.

Terms you'll see in this tutorial (click to expand)

Domain pack: A typed configuration bundle persisted as a JSON contract against the slm.domain-pack/v1 schema. Carries identity, hook references, and an overlay block (dataset split, training defaults, registry gates). Stored in the domain_packs table; one row per pack.
Recipe: The training-plan template you pick at project creation (rag-protocol, span-extraction, qa-sft, etc.). The pack overlays the recipe — it does not replace it.
Overlay: The fields inside a pack contract that override platform defaults. Three first-class overlay sections today: dataset_split, training_defaults, registry_gates. Other sections (cleaning hints, glossary, eval thresholds) live in the contract JSON but are not exposed as separate UI form fields.
Hook: One of three plug-points in a pack: normalizer (per-row transformation), validator (batch quality report), evaluator (metrics enrichment). Each hook is selected from a built-in catalog by ID, with an optional per-hook config dict.
Registry gates: Promotion thresholds the pack imposes for moving an experiment from training → staging or staging → production. Independent of eval-pack gates but stack with them.
Schema reference: The literal slm.domain-pack/v1 — the contract version every pack is validated against. The platform patches missing fields on load (the historical hooks field, for instance), but breaking changes require a new schema version.
Default profile: The domain profile a pack adopts when assigned. The platform's default pack points at generic-domain-v1; bespoke packs typically reuse the same profile and rely on the overlay to differentiate.
Three-way match: The AP control where an incoming invoice has to reconcile against (a) the purchase order issued by procurement and (b) the goods receipt confirming delivery. Mismatch blocks payment in ERPs like SAP, Oracle, NetSuite, Coupa. The single concept most worth getting right when designing a finance pack.

The most common mistake teams make starting out is treating each project as a fresh start. The first invoice extractor lands; conventions get baked in; six weeks later somebody starts a PO extractor and re-derives the same currency normalisation, precision floors, and glossary. The third project does the same. Now there are three projects with three drifting interpretations of "what good looks like" in AP, and the auditor at year-end has three stories to read. Domain packs exist to stop that drift — bundle the conventions into a typed contract once, assign at project creation, and every AP project inherits the same shape. The finance / AP pack we build here is the canonical reference because the field-level precision priorities tie directly to the controls SOX 404 examines.

What you'll build

A registered finance-pack-v1 with:

Identity — pack_id, version, AP-eng owner, status active, schema_ref slm.domain-pack/v1, generic-domain-v1 as the default profile.
Hooks — three references from the built-in catalog (default-normalizer, default-validator, default-evaluator), each with a config dict carrying finance-specific knobs.
Overlay — dataset_split — 80/10/10 with seed 42 for reproducible manifest hashes.
Overlay — training_defaults — sft / llama3 / 3 epochs / batch 4 / lr 2e-4 / LoRA on. Same shape as platform default, pinned so future drift doesn't move your team's defaults.
Overlay — registry_gates — tighter than platform default. to_staging: F1 ≥ 0.72, llm_judge ≥ 0.80. to_production: F1 ≥ 0.78, llm_judge ≥ 0.85, safety ≥ 0.95, max regression 0.02 on F1 and exact match.
Contract sections for cleaning recipes, the evaluator hook's weights config, glossary — JSON-only (no first-class form fields). Per-entity precision floors themselves live in the project's eval pack, not the domain pack.
Assigned to your Tutorial 3 invoice project with adopt_pack_default_profile=true.

Here's the skeleton you'll end up with — a 30-line slice; the full contract appears in the Worked example section:

{
  "$schema": "slm.domain-pack/v1",
  "pack_id": "finance-pack-v1",
  "version": "1.0.0",
  "display_name": "Finance / AP",
  "description": "Overlay for accounts-payable extraction projects: invoices, POs, statements, expense reports.",
  "owner": "ap-eng",
  "status": "active",
  "default_profile_id": "generic-domain-v1",
  "tags": ["finance", "accounts-payable", "extraction"],
  "overlay": {
    "dataset_split": { "train": 0.8, "val": 0.1, "test": 0.1, "seed": 42 },
    "training_defaults": { "training_mode": "sft", "chat_template": "llama3", "num_epochs": 3,
                           "batch_size": 4, "learning_rate": 0.0002, "use_lora": true },
    "registry_gates": {
      "to_staging":    { "min_metrics": { "f1": 0.72, "llm_judge_pass_rate": 0.80 } },
      "to_production": {
        "min_metrics": { "f1": 0.78, "llm_judge_pass_rate": 0.85, "safety_pass_rate": 0.95 },
        "max_regression_vs_prod": { "f1": 0.02, "exact_match": 0.02 }
      }
    }
  },
  "hooks": {
    "normalizer": { "hook_id": "default-normalizer", "config": { /* finance knobs */ } },
    "validator":  { "hook_id": "default-validator",  "config": { /* finance knobs */ } },
    "evaluator":  { "hook_id": "default-evaluator",  "config": { /* finance knobs */ } }
  }
}

Key idea

A pack is a typed config bundle, not a model. There is no "finance base model" sitting underneath; the model you train still comes from the recipe. The pack is the layer that says "in our AP world, the dataset splits look like this, the registry gates look like this, the hooks behave like this, the precision priorities look like this." Apply it at project create; every project in the AP portfolio inherits the same shape.

Why bundle conventions as packs

Three reasons, in escalating order of how badly the absence hurts.

Consistency across siblings: An AP portfolio is rarely one project — invoice extractor, PO extractor, statement-reconciliation classifier, expense-report categoriser. They share a vocabulary, a currency set, and a definition of "what good looks like" on total_amount precision. Without a pack, each project re-derives the conventions and drift accumulates until somebody notices in the same room.
Maintenance — change once, propagate by reassign: When you tighten the production F1 floor from 0.75 to 0.78, you change it in one place. Bump the version, reassign, the new gates apply on next prepare. Without the pack you'd hand-edit five eval configs, miss one, discover it in a post-mortem.
Audit trail — a typed artifact the auditor can read: A pack is a JSON contract with identity, version, owner, status, explicit gates. "How does this team enforce minimum precision on payment-blocking fields?" — point at the pack. Better than "we agreed in Slack last March."

The pack is a discipline, not magic: if it's true across more than one of our projects, it belongs in the pack; otherwise it stays in the project. The rest of this tutorial applies that discipline to finance / AP.

Pack anatomy — every field, what it does, why it's there

The pack is a single typed contract validated against slm.domain-pack/v1. Eleven fields on the row, of which the contract JSON carries identity + overlay + hooks. Walking it field by field, because once you understand each one you can build any pack you need.

Identity fields

pack_id (3-128 chars, normalised lowercase-with-dashes, unique): The stable identifier the rest of the platform references. Once a project is assigned, that string is the join key — treat it as immutable. Casing and whitespace are normalised on lookup, so Finance-Pack-V1 and finance-pack-v1 resolve to the same row.
version (1-32 chars): User-managed version string. The duplicate flow auto-bumps the patch component, but there is no structured semver parsing — it's a string, not a typed semantic version. Use semver shape anyway (1.0.0, 1.1.0) for the audit trail.
display_name (1-255 chars, required): What humans see in the manager dropdown. "Finance / AP" reads better than "finance-pack-v1" in a list of twelve.
description (optional): One or two sentences. The audit reader is your audience.
owner (1-128 chars, default platform): The team that maintains the pack. The platform doesn't enforce permissions off it but does display it.
status (draft / active / deprecated): Lifecycle marker. There is no DELETE endpoint for packs — flip to deprecated to retire one. Existing assignments are unaffected; deprecation does not detach.
schema_ref (default slm.domain-pack/v1): The contract schema. Only one shipping today. Leave it.
default_profile_id (3-128 chars, nullable): The domain profile a project adopts when assigned with adopt_pack_default_profile=true. For most bespoke packs, reuse generic-domain-v1.
tags (list): Free-form labels. Not filtered in the UI, but visible in the contract diff when you bump versions.

The overlay block

The overlay is where the pack does its work. Three first-class fields on the overlay spec:

overlay.dataset_split: The ratios used at prepare-dataset time. Default { "train": 0.8, "val": 0.1, "test": 0.1, "seed": 42 }. For AP, the default ratios are almost always right. The platform's prepare endpoint also accepts a Stratify By Field input (groups rows by a class label and preserves per-class proportions across train/val/test — useful for AP categories like "invoice", "credit_memo", "statement") and a Disjoint By Field input (assigns each group whole to one split — the right primitive for "same vendor / same template shouldn't leak between train and test"). Both surface in the Prepare Dataset panel UI; both get a per-group breakdown in the resulting split manifest. The two are mutually exclusive — stratify requires splitting groups, disjoint forbids it.
overlay.training_defaults: Pinned training-hyperparameter defaults — sft / llama3 / 3 epochs / batch 4 / lr 2e-4 / LoRA on, matching the platform default. Pinning them in the pack future-proofs against platform-default drift.
overlay.registry_gates: Promotion thresholds — to_staging.min_metrics and to_production.min_metrics set the floors; to_production.max_regression_vs_prod caps per-metric regression for re-promotion. Independent from eval-pack gates but stack with them.
Other overlay sections (data_quality, normalization, tools, evaluation, audit, glossary): The overlay accepts arbitrary additional keys, stored in the JSON and read by hooks that know to look. Not surfaced as form fields — you edit them in the contract textarea.

The hooks block

Three independent hook references — one of each type:

hooks.normalizer: Per-row callable. Receives raw record + canonical record + config; returns normalised record or None to drop. Built-ins: default-normalizer, qa-required-normalizer, min-text-length-normalizer, strip-markdown-normalizer.
hooks.validator: Batch callable. Receives records + profile + config; returns a report with status, total_records, valid_records, failed_records, pass_rate, base_normalization_coverage. Built-ins: default-validator, min-text-length-validator, qa-pair-validator.
hooks.evaluator: Receives eval_type + metrics + config + context; returns enriched metrics. Built-ins: default-evaluator, pass-rate-band-evaluator, weighted-score-evaluator. For finance, weighted-score pairs naturally with per-entity precision priorities (see below).

Each hook reference is a hook_id string plus a config dict shaped however the hook implementation expects. Hook errors fall back to default behaviour — robust, but a typo'd config key is silent. Test your hooks.

✓ Checkpoint: you can name every field and say which layer reads it (identity = display; overlay = manifest at prepare/train/promote; hooks = data + eval pipeline).

Design your pack — what belongs where

The first design question is "what belongs in the pack vs the project." Get it wrong and either the pack carries project-specific bits that don't generalise, or the project carries pack-shaped conventions that drift between siblings. A four-row decision table:

Question	Pack	Project
True across most AP projects (invoice + PO + statement + expense)?	✓	—
True only for one project (e.g. invoice line-item descriptions)?	—	✓
An audit answer the org needs (precision floor tied to a SOX key control)?	✓	—
A model artefact (LoRA adapter, checkpoint, eval results)?	—	✓

The shorthand: policy goes in the pack, artefacts go in the project. The pack says "we expect this precision on payment-blocking fields"; the project carries the model and eval results demonstrating it. "Show me the policy" — point at the pack. "Show me compliance" — point at the project's eval row. The per-entity precision priorities (total_amount tight, line_item_description looser) belong in the pack; the eval scores belong in the project.

Cleaning recipes — currency, locale, three-way-match harmonisation

The cleaning pipeline runs per row on import and before training. The pack overlays its conventions on the platform's cleaning defaults via the normalizer hook's config dict — most knobs in finance land here. Three areas worth packaging.

Currency normalisation

Currency parsing makes the cut because the gotchas are quietly numerous and a naive "split on comma" parser misclassifies a meaningful fraction of European and Indian invoices:

US / UK / most-English use comma-thousand period-decimal ($1,234.56). The baseline most parsers assume.
German / Austrian / Italian / most-EU use period-thousand comma-decimal (1.234,56 €). A US-locale parser reading that as a decimal lands at one thousand two hundred thirty-four — a thousand-fold overpayment.
French often uses space-thousand comma-decimal (1 234,56 €); non-breaking spaces from PDF extractors break naive regex.
Swiss CHF uses apostrophe-thousand period-decimal (1'234.56) — apostrophe is distinctive enough to drive locale detection.
Indian Numbering System uses lakh grouping — first separator after 3 digits then every 2 (1,23,456.78). Parsers assuming 3-digit groups mis-parse Indian invoices.
JPY / KRW have no minor unit (¥1,234, not ¥1,234.00). Forcing two decimals introduces spurious .00 values.

The canonical normalisation target is ISO 4217 codes — USD, EUR, GBP, INR, JPY, CHF — paired with the amount as a decimal. Symbol-only ($) is ambiguous (USD vs CAD vs AUD vs MXN vs SGD); the pack should encode "bare dollar sign resolves via vendor billing country" rather than guess.

The pack contract encodes the locale set the normalizer attempts plus the fallback rules; the hook reads the config and does the work:

"normalizer": {
  "hook_id": "default-normalizer",
  "config": {
    "currency": {
      "iso4217_target": true,
      "locales_attempted": ["en_US", "en_GB", "de_DE", "fr_FR", "it_IT", "de_CH", "en_IN", "ja_JP"],
      "symbol_only_fallback_via": "vendor_billing_country",
      "no_minor_unit_currencies": ["JPY", "KRW"]
    }
  }
}

Today: default-normalizer is a no-op

Today the platform's default-normalizer hook is a no-op — it accepts the config and returns the row unchanged. The config block above is read-only documentation until you ship a custom normalizer plugin that implements the locale-aware logic, OR you handle currency normalisation in your gold-set preparation workflow (the more common path). The config still documents the intent for your team; the actual normalisation lives elsewhere for now.

Date format disambiguation

03/04/2026 is March 4 in the US and April 3 across most of Europe. The pack-level policy: if the vendor billing country is in the EU-locale set, parse as DD/MM/YYYY; otherwise MM/DD/YYYY; if no country is known, flag the row for the validator. Silent locale flips on due_date are the textbook way to miss early-payment discounts.

Three-way-match field harmonisation

The three-way match reconciles each invoice line against the purchase order and the goods receipt; mismatch blocks payment in the ERP. Common break points: quantity drift (ordered vs received vs billed), unit-price drift (list vs negotiated), UOM mismatch (case vs each), and PO-line vs invoice-line aggregation differences.

At the cleaning layer, the pack harmonises the field shapes feeding the match — standardise UOM strings (EA, CS, BX, PLT) onto a common vocabulary, normalise quantity to a decimal, pin PO_number formatting so a vendor's reformatting whim doesn't break the match key. The match runs in the ERP; the data feeding it should be clean.

Three-way match is the control; extraction is the data feeding it

Get the amount, the PO number, and the quantities wrong and the control fails silently — which is exactly the scenario SOX 404 exists to prevent. The pack's cleaning recipe is partially in service of keeping the three-way match's input clean.

✓ Checkpoint: your normalizer hook config carries the currency-locale set, the date-format resolution rule, and the UOM standardisation vocabulary your AP team agrees on. If two contributors disagree on the right value, that's the conversation the pack design forced — which is the point.

Eval threshold overlays — per-entity precision floors

The most load-bearing finance-pack insight is that not every field is equally important. An extractor scoring 99% on line_item_description and 94% on total_amount is not "mostly working" — it's failing on the field that drives the SOX control. The pack encodes that asymmetry once; every project inherits it.

Three tiers, in descending priority:

Payment-blocking (tightest floor): total_amount, currency, vendor_tax_id, invoice_number, invoice_date, due_date, PO_number, line quantity, line unit_price. Errors block payment or cause over/duplicate payment. The duplicate-detection key is typically (vendor_id, invoice_number, invoice_date, amount) — invoice_number precision directly drives that control. Target: precision ≥ 0.95.
Match-feeding (tight floor): UOM, line aggregation keys, vendor name. Don't immediately block payment but break the three-way match and route to the exceptions queue. Target: precision ≥ 0.90.
Informational (looser floor): shipping_address, line_item_description prose, contact name, remit-to phone. Recoverable, rarely block payment. Target: precision ≥ 0.80.

The pack does not override eval-pack thresholds directly. Eval packs and domain packs are siblings — eval packs gate the trained model; domain packs gate dataset / training / registry transitions. Per-entity floors live in the eval pack the project selects (the JSON copy-edit-select flow from the eval-pack scaffold). The domain pack documents your team's per-entity priorities as guidance; the actual gating happens at the eval-pack layer. The evaluator hook on the domain pack provides only the weighted-score signal:

"evaluator": {
  "hook_id": "weighted-score-evaluator",
  "config": {
    "weights": { "payment_blocking": 0.6, "match_feeding": 0.3, "informational": 0.1 }
  }
}

Future: per-entity precision floors as a pack-level overlay

Per-entity precision floors as a pack-level overlay aren't yet wired into the evaluator hook — today the weighted-score evaluator reads only a weights key. For the moment, set per-entity precision floors on your eval pack (via the JSON copy-edit-select flow), not on the domain pack.

The weighted-score evaluator adds a single weighted_score metric to your eval results — a configurable weighted average across entities you've prioritised. It does NOT compute per-tier precision or flag floor failures; that's an eval-pack concern (set those gates in the pack JSON, not the evaluator hook). The pack's registry_gates stay independent and still apply.

Why the asymmetry matters for AP: duplicate detection samples on amount; auditors sample on amount; reconciliation breaks on penny mismatches; the three-way match keys on PO_number and quantities. The payment-blocking tier is the set of fields where extraction errors translate directly to a control failure. Skipping an early-payment discount because due_date drifted by a month is a 36.5%-annualised opportunity cost on that invoice; quietly paying a duplicate because invoice_number missed a leading zero is a control deficiency the auditor will write up. The looser informational floor isn't laziness — it's honest prioritisation of the precision budget.

Refusal variants — current platform gap, honest workaround

If you've read T8, you've met this gap. The pack contract is a typed config bundle applied at project-create time; synthetic playbooks read their prompt text from the project's synth config separately. The pack does not rewrite playbook prompts today. Per-pack refusal-phrase customisation would be a natural fit for the contract — it's a portfolio-wide policy — but it isn't wired yet.

The honest workaround, translated to finance:

Hand-seed 8-12 gold rows where the right answer is your AP team's refusal phrase ("I can't extract that field from this invoice — the required marker is missing or illegible. Please route to the AP exceptions queue.").
Customise the relevant synth playbook's prompt text in the project's synth config so the teacher's refusals match your team's approved wording rather than the platform default.
Hand-edit a few generated rows in the synth review queue before accepting.

Document the workaround in the pack's description field. The audit reader sees "this pack uses the platform default refusal phrase; the AP-eng team layers the team-specific phrase via gold + synth prompts per project" — the honest state.

Honest gap, not a feature

Putting refusal phrases into the pack contract today is "looks right, no effect." The validator accepts any extra keys, but no platform code reads them. Keep the workaround in the runbook, not the contract.

Glossary overlays — packaging the vocabulary

AP has a glossary every new team member has to absorb: PO (purchase order), GR/IR (goods receipt / invoice receipt), three-way match, vendor master, tax ID, payment terms (Net 30, Net 60, EOM, 2/10 net 30). Packaging the vocabulary into the pack does two things — it gives the audit reader a documented definition of the terms the pack policies refer to, and it gives downstream consumers (the synth playbook prompts, the eval explanations, the model card) a single source of truth.

The pack contract today does not surface a first-class glossary field; the overlay accepts arbitrary keys, so you encode the glossary in the contract JSON and the downstream consumers read it by convention. A minimal AP glossary section looks like:

"overlay": {
  "glossary": {
    "PO":              "Purchase order. The procurement document that authorises a vendor to deliver goods/services.",
    "GR/IR":           "Goods receipt / invoice receipt. The accrual account for received-not-yet-invoiced spend.",
    "three_way_match": "Reconciliation of invoice ↔ purchase order ↔ goods receipt before payment.",
    "vendor_master":   "Authoritative supplier record (legal name, tax ID, remit-to bank, terms).",
    "tax_id":          "Vendor tax identifier (EIN for US persons, VAT for EU, GSTIN for India, etc.).",
    "Net 30":          "Payment due 30 days after invoice date.",
    "2/10 net 30":     "2% discount if paid within 10 days; otherwise full balance by day 30."
  }
}

It's worth stating the limits clearly. The glossary section is not a first-class form field in the manager — you author it inside the contract JSON. Nothing in the platform parses the glossary keys today; the value of including it is documentation, not behaviour. If you want the glossary to drive synth prompt generation or model-card output, the consumer code that reads it lives in your hooks or your project automation, not in the platform core.

The selection that pays back the most is the dozen terms that appear in your refusal phrases, your eval explanations, and your audit responses. Anything beyond that is glossary-bloat — the second-most-common pack-design mistake (the first is putting refusal phrases there, see the prior section).

Building the pack in the Domain Pack Manager

Open the Domain Pack Manager from Project → Domain → Pack. It's the only entry point — pack assignment doesn't happen anywhere else. Four areas:

Active pack panel — top, read-only, shows the currently-assigned pack's display name and pack_id@version. If nothing was set, shows general-pack-v1.
Pack-selection controls — a dropdown of every installed pack plus four buttons: Assign to Project, View / Edit Contract, Duplicate + Open Editor, New Pack. A clickable grid list below with metadata badges (status, system flag, selected) offers the same selection as the dropdown.
Hook catalog — three-column panel of available normalizers, validators, evaluators with descriptions and source plugin module. Below: Loaded Plugin Modules, Plugin Load Errors, and two buttons (Refresh Catalog, Reload Plugins).
Editor modal — opens on New Pack, View / Edit Contract, or after a duplicate. Contains the Hook Helper (three dropdowns + Load From JSON / Apply To JSON for bidirectional sync) and the Contract JSON textarea (monospace editor where the contract actually lives).

The flow to build finance-pack-v1:

Click New Pack. The editor opens with a starter contract.
Fill identity in the JSON — pack_id, version, display_name, description, owner, status, default_profile_id, tags. The hint flags pack_id and display_name as required.
Use the Hook Helper to pick hooks. Pick a normalizer, validator, evaluator; click Apply To JSON to write the IDs into the contract's hooks section. The dropdowns are the only first-class way to see which hook IDs are valid — typing them by hand is a typo waiting to happen.
Edit the overlay JSON — dataset_split, training_defaults, registry_gates, plus the additional contract sections (cleaning hints, glossary, the evaluator hook's weights config). Overlay sections beyond the three first-class fields are JSON-only; no form fields exist for cleaning, validation, glossary, or eval thresholds.
Save. Validates against slm.domain-pack/v1. Common errors: typo'd hook IDs (Hook Helper avoids this), missing required identity fields, malformed JSON.

What the UI does not surface

The manager has no search or filter on the pack list (every pack shows; pick from the dropdown or click in the grid). Overlay sections beyond the three first-class fields are JSON-only — no form fields for cleaning, glossary, or eval thresholds. There is no view-only mode (opening the editor opens it editable). There is no in-line delete or archive button; use the status enum to deprecate. There is no version picker (the version shown is whatever was last saved). There is no batch operation (assign-many, duplicate-many). And there is no unassign button after a pack has been assigned to a project — only reassign to a different pack.

✓ Checkpoint: the pack list shows finance-pack-v1 alongside the seeded general-pack-v1, with the active badge if you assigned it, the system badge only on the seeded one, and your chosen status. Opening the editor on it should round-trip — Load From JSON populates the hook helper dropdowns to the IDs you saved, Apply To JSON writes them back unchanged.

Testing your pack before applying it

There's no dry-run endpoint, and the manager's save step validates only the JSON contract structure (schema reference, required identity fields, hook reference shape). The actual hook behaviour is wrapped in try/except with fallback to default behaviour — a typo in a hook config key is silent. The implication: testing is on you.

The minimum acceptable test pass:

Round-trip the contract. Save the pack, reopen the editor, confirm the contract JSON comes back unchanged (modulo whitespace). If it doesn't round-trip, the validator dropped a field you wrote — usually a typo in a known key.
Assign to a sandbox project. Create a throwaway project (or reuse an existing scratch one), assign the pack, look at the project's prepare-dataset and training-defaults panels. Confirm the dataset split, training defaults, and registry gates now reflect the pack's overlay rather than the platform default.
Run prepare and watch the validator output. The validator hook reports its findings; compare a baseline run to your test run to see whether your pack config is actually being applied. If you expected a config change to bite and didn't see one, check: (1) the hook ID spelling matches a registered hook, (2) any custom plugin you wrote has been loaded via the Reload Plugins button on the manager, (3) your config-key names match what the hook implementation reads. Most pack debugging starts with one of these three.
Run a training experiment. Even a small one — 30 rows is enough. The registry gates show on the experiment page after training; the weighted_score metric shows in the eval summary if your evaluator hook is reading the weights config. The act of running through end-to-end catches the hook-loading and config-key issues the JSON validator can't catch.

The Reload Plugins button is your friend during this phase. If you ship a custom hook implementation as a Python plugin module, the platform lazy-loads it on first reference; if you change the plugin code, you'll need to reload to pick up the change. The Loaded Plugin Modules card on the catalog shows you what's currently in memory; the Plugin Load Errors card surfaces import-time failures.

Assigning the pack to projects

Assignment is a single click — pick the pack from the dropdown, click Assign to Project, confirm. What actually happens on the backend is worth knowing:

The project's domain_pack_id foreign key gets set to the pack's pack_id.
If adopt_pack_default_profile=true (the UI's default), the project's domain_profile_id gets reset to the pack's default_profile_id — if that profile exists. If it doesn't, the profile stays as-is.
The runtime resolution path will, from this point onward, fetch the pack's contract on every prepare / train / promote and merge the overlay into the effective project config.

One quietly important fact: the merge precedence depends on whether the assigned pack is the platform default. For the platform default pack (general-pack-v1), the project's profile takes precedence over the pack's overlay — the default pack is a fallback, not an override. For any non-default pack (everything you'll ever build), the pack's overlay takes precedence over the profile. The intent is clear: if you went out of your way to build a custom pack and assign it, you want it to win.

There is no unassign or detach endpoint. Reassign to a different pack to change behaviour; manually null-setting the foreign key would require database surgery, which is not a workflow the platform supports. The practical workaround for "I want to undo this pack assignment" is to reassign to general-pack-v1, which puts the project back on platform defaults.

And there is no automatic re-assignment when you bump a pack's version — see the next section.

✓ Checkpoint: your invoice project's header now labels it with finance-pack-v1; the prepare and training panels show the pack's dataset split and training defaults; the registry gates panel shows the tightened to-staging and to-production floors.

Versioning — what packs do, what they don't, how to migrate

The pack's version is a user-managed string. No structured semver parsing, no migration plumbing, no auto-versioning. The duplicate flow auto-bumps the patch component when you copy a pack — that's the only automation. Three implications:

Bumping the version is a discipline, not a feature. When you tighten the production F1 floor from 0.78 to 0.80 (the kind of change a team makes after their first audit cycle), bump the version. 1.0.0 → 1.1.0 for material policy changes; 1.0.0 → 1.0.1 for description tweaks.
Pack updates do not retroactively change assigned projects. Overlay is read at prepare / train / promote time. Existing projects pick up the new overlay on their next run, not retroactively; in-flight runs are unaffected.
No DELETE. Retire by setting status=deprecated; existing assignments keep working, the badge discourages new ones. The platform doesn't refuse a deprecated-pack assignment — the signal is social.

Migration workflow for bumping to finance-pack-v2:

Duplicate finance-pack-v1. The manager opens the editor on a copy with a bumped pack_id (auto-suffixed -v2) — overwrite to 1.1.0 for an in-place bump, or to a new pack_id if the contract changes shape.
Edit the overlay or hook config to encode the change.
Save.
Re-assign the AP projects you want migrated. Each picks up the new overlay on its next prepare / train / promote.
Mark finance-pack-v1 as deprecated once migration is complete.

The discipline pays back when an auditor reads the contract history: "this pack moved to v1.1.0 on 2026-09-15 to raise the production F1 floor; here are the projects that re-prepared under v1.1.0 and each project's eval row demonstrating compliance." That sentence is what the versioning system buys you.

BrewSLM does not ship a Pack Marketplace, a Pack Hub, or a shared upstream registry. There is no export endpoint and no import endpoint. The sharing model is manual JSON copy-paste, which is honest about what exists and works fine at the scale most internal AP teams operate at.

The workflow:

Source side — open the pack in the editor, copy the contract JSON out of the textarea.
Paste it into a shared internal docs page, a git repo, or wherever your team versions configuration. Treat the JSON as the canonical artefact — it's the typed contract anyway.
Target side — in the other tenant's Domain Pack Manager, click New Pack, paste the JSON into the editor's textarea, save.
Re-pick the hooks via the Hook Helper. Hook IDs are tenant-local — if the target tenant has different hook plugins installed, the source pack's hook IDs may not resolve. The Hook Helper's dropdowns are populated from the target tenant's catalog; pick equivalents and Apply To JSON.

This sounds clunky and is. It is also, today, exactly what the platform offers. The accurate framing for the audit reader is "the pack contract is stored as JSON, which we version in our internal docs and replicate by manual copy to each BrewSLM tenant." Not "we have a marketplace." If your AP team operates across many tenants, treat the JSON as the artefact and the tenant as the deployment target.

Worked example — the full finance-pack-v1 contract

Pulling the sections together. Annotations inside the JSON use // … markers — they're not valid JSON syntax, the reader sees them as inline commentary; strip the comments before pasting into the editor.

{
  "$schema": "slm.domain-pack/v1",
  "pack_id": "finance-pack-v1",
  "version": "1.0.0",
  "display_name": "Finance / AP",
  "description": "Overlay for accounts-payable extraction projects (invoices, POs, statements, expense reports). Encodes currency-locale normalisation, a weighted-score evaluator hook reflecting which fields drive the SOX key controls, AP glossary, and tighter registry gates than the platform default. (Per-entity precision floors as gating live in each project's eval pack.)",
  "owner": "ap-eng",
  "status": "active",
  "default_profile_id": "generic-domain-v1",
  "tags": ["finance", "accounts-payable", "extraction", "sox-keyed"],

  "overlay": {
    // Standard split, pinned so a platform-default drift doesn't move the AP team's ratios.
    "dataset_split": { "train": 0.8, "val": 0.1, "test": 0.1, "seed": 42 },

    // Pinned training defaults — same shape as the platform default, explicit in the pack.
    "training_defaults": {
      "training_mode": "sft", "chat_template": "llama3",
      "num_epochs": 3, "batch_size": 4, "learning_rate": 0.0002, "use_lora": true
    },

    // Tighter than platform default (staging 0.65, production 0.70, regression cap 0.03).
    "registry_gates": {
      "to_staging":    { "min_metrics": { "f1": 0.72, "llm_judge_pass_rate": 0.80 } },
      "to_production": {
        "min_metrics":            { "f1": 0.78, "llm_judge_pass_rate": 0.85, "safety_pass_rate": 0.95 },
        "max_regression_vs_prod": { "f1": 0.02, "exact_match": 0.02 }
      }
    },

    // Non-first-class overlay sections — JSON-only, read by hooks that know to look here.
    "glossary": {
      "PO":              "Purchase order — procurement document authorising a vendor to deliver.",
      "GR/IR":           "Goods receipt / invoice receipt accrual for received-not-yet-invoiced spend.",
      "three_way_match": "Reconciliation of invoice ↔ PO ↔ goods receipt prior to payment.",
      "vendor_master":   "Authoritative supplier record (legal name, tax ID, remit-to bank, terms).",
      "tax_id":          "Vendor tax identifier (US EIN, EU VAT, India GSTIN, etc.).",
      "Net 30":          "Payment due 30 days after invoice date.",
      "2/10 net 30":     "2% discount if paid within 10 days; otherwise full balance by day 30."
    }
  },

  "hooks": {
    "normalizer": {
      "hook_id": "default-normalizer",
      "config": {
        "currency": {
          "iso4217_target": true,
          "locales_attempted": ["en_US", "en_GB", "de_DE", "fr_FR", "it_IT", "de_CH", "en_IN", "ja_JP"],
          "symbol_only_fallback_via": "vendor_billing_country",
          "no_minor_unit_currencies": ["JPY", "KRW"]
        },
        "date": {
          "resolution_via": "vendor_billing_country",
          "us_locales": ["en_US"], "eu_locales": ["en_GB", "de_DE", "fr_FR", "it_IT", "de_CH"],
          "ambiguous_action": "flag_for_validator"
        },
        "three_way_match_fields": {
          "uom_vocabulary": ["EA", "CS", "BX", "PLT", "KG", "L"],
          "po_number_strip_whitespace": true,
          "po_number_uppercase": true
        }
      }
    },
    "validator": {
      "hook_id": "default-validator",
      "config": {
        "required_fields": ["total_amount", "currency", "invoice_number", "invoice_date"],
        "flag_ambiguous_dates": true,
        "flag_missing_po_number_for_goods": true
      }
    },
    "evaluator": {
      "hook_id": "weighted-score-evaluator",
      "config": {
        "weights": { "payment_blocking": 0.6, "match_feeding": 0.3, "informational": 0.1 }
      }
    }
  }
}

Future: per-entity precision floors as a pack-level overlay

That is the full contract. It encodes the conventions an AP team is likely to converge on after one or two audit cycles, in a shape every project in the AP portfolio can apply at create-time. The audit reader's first question — "show me the policy" — has a single artefact to answer it.

Applying the pack to the invoice project

Go to your Tutorial 3 invoice project. Open the Domain Pack Manager. The Active panel shows general-pack-v1. Pick finance-pack-v1 from the dropdown; click Assign to Project; confirm.

What changes on the next prepare or training run:

Dataset split moves to 80/10/10 with seed 42. If the project was on a different seed, the held-out slice changes — re-run eval after the next prepare for comparable numbers.
Training defaults are pinned. Per-experiment overrides still win at the trainer-call layer; the pack pins what new experiments inherit.
Registry gates tighten. Existing eval rows may now show "not promotable" against the new floors even if they were promotable before — the pack is doing its job.
The cleaning pipeline runs with the new normalizer config on the next prepare. Currency attempts the full locale set; dates resolve via vendor billing country; UOM standardises.
The eval summary now includes a single weighted_score metric (the configured weighted average across the entities you prioritised) alongside the raw F1/precision/recall numbers. Per-entity precision and recall are still reported by the eval handler in the standard breakdown; the weighted_score is the additional convenience signal the evaluator hook provides.
The refusal phrase does not change automatically — the gap from the refusal section applies. Customise the synth prompt to imprint the AP-team phrase.

One subtle artefact: adopt_pack_default_profile=true is the UI default, so the project's domain_profile_id resets to generic-domain-v1. If you had a bespoke profile, re-set it afterward.

✓ Checkpoint: the experiment page reflects the tightened registry gates; the eval summary shows the additional weighted_score metric from the evaluator hook (set up correctly + reading your weights config); the validator surfaces ambiguous-date and missing-PO flags. If nothing changed, check the project header for the active pack badge.

Extending the pack to PO, statement, and expense extractors

The same pack carries cleanly to the other AP projects in the portfolio. The interesting question is what stays in the pack and what stays per-project. Walking through the sibling projects:

Purchase-order extractor: Same currency normalisation, same date handling, same UOM standardisation, same tax_id precision floor. POs have a slightly different field shape (no invoice_date, no due_date, but they do have a required_delivery_date and a PO-issuer's authorising signature), so your team's mental tiers shift — required_delivery_date joins the payment-blocking priority group and the signature field is informational. You'd reflect this in the eval pack's per-entity gates, not in the domain pack itself (per the "eval pack carries the floors" note above). The conventions are pack; the field shape is project.
Statement-reconciliation classifier: This is a classifier, not an extractor, so per-entity precision priorities don't directly apply. What does apply is the dataset_split, training_defaults, registry_gates — the structural overlays. The hooks still apply too; the validator's three-way-match checks aren't relevant, but the currency normalisation is (statements list invoice amounts; the same locale set has to parse). The pack's design intentionally allows partial use — only the relevant overlays bite.
Expense-report categoriser: Closer to the invoice case but with a smaller field surface (employee_id, expense_category, expense_amount, currency, receipt_date). Same currency and date rules apply. Your team's mental priority tiers compress to two — a payment-blocking group (amount, currency, employee_id) and an informational group (category prose, merchant name) — and you'd encode the per-entity floors that follow in the project's eval pack rather than in the domain pack.

The pack scales across the portfolio by carrying the conventions that do generalise and leaving the field-shape specifics to the project. If you find yourself wanting to put per-project field lists into the pack, that's the signal to ask "do these field lists generalise, or am I about to drag project-specific cruft into the pack?" Usually it's the latter; usually the right answer is to leave it in the project and let the pack carry the framework.

Anti-patterns — common pack-design mistakes

Over-stuffing — everything goes into the pack: The temptation when you start is to put everything that worked into the pack — exact field list, specific synth prompts, model-card text. The result is a project clone with extra steps that doesn't generalise. The discipline: "true across more than one project"; if not, leave it in the project.
Version drift — changing thresholds without bumping the version: Editing the JSON in place and saving without bumping the version leaves the audit trail with a single version string and a silently-changed policy. Bump on any policy change, even small ones. 1.0.0 → 1.0.1 is cheap; "we don't know which version was in force during Q3 close" is expensive.
Tight coupling — pack assumes a specific recipe: Hardcoding training_mode: span-extraction means the pack can't apply to siblings on a different recipe. Encode the values that genuinely generalise (epochs, learning rate, batch size); leave recipe-specific ones to the project's recipe-default layer.
Schema rot — fields the validator accepts but no platform code reads: Adding aspirational fields (refusal customisation, eval-gate overrides, model-card templates) gives a contract that looks richer than it is. The audit reader can't tell which fields drive behaviour. Write only fields the platform reads, and document the gap honestly when you'd like more.
Pack-versus-eval-pack confusion: Pack registry_gates and the eval pack's eval gates are separate. Tightening the production F1 floor in the pack does not change what the eval pack scores against. Both must pass for promotion; both need updating for policy changes.
Refusal phrases in the contract: The validator accepts any extra keys, but no platform code reads a refusal phrase today. Encoding refusal text in the contract gives the illusion of policy with none of the behaviour. Carry refusal text in the project's synth prompts until pack-level wiring exists.

What's next

You have a registered finance-pack-v1, an invoice project applying it, and a path to extend it to the rest of the AP portfolio. Three next moves:

Apply it across the portfolio: Open every AP project in your tenant. Assign the pack. Watch which ones now show "not promotable" against the tightened registry gates — those are the projects that were quietly under-performing. Use that signal to prioritise data and synth work.
Monitor pack effectiveness over time: The pack is policy; the projects' eval rows are the evidence. Track per-quarter the share of AP-portfolio projects whose latest eval clears the pack's gates. Falling share means either the gates have outrun the data shape (sometimes legitimate — bump the version downward) or the data shape is decaying (more likely — investigate per project).
Contribute back to your internal pack library: If your finance-pack-v1 turns out to encode conventions other AP teams in your org would benefit from, copy the JSON contract into your internal documentation repo, label it with the version, and circulate. There's no marketplace; the JSON file is the artefact. Other teams paste it into their tenant via New Pack, re-pick hooks via the Hook Helper if the IDs differ, and they have the same policy applied.

For more end-to-end tutorials covering other recipes — the rag-protocol family, span extraction, classification — head back to the tutorials hub.

Key terms

Domain pack: Typed configuration bundle persisted as JSON against the slm.domain-pack/v1 schema. Carries identity, overlay (dataset_split, training_defaults, registry_gates plus other JSON-only sections), and three hook references (normalizer, validator, evaluator). Stored in the domain_packs table, one row per pack.
Overlay: The fields in a pack contract that override platform defaults. Three first-class sections (dataset_split, training_defaults, registry_gates); additional sections (cleaning hints, glossary, eval thresholds via hook config) live in the JSON but are not exposed as form fields in the UI.
Hook: One of three plug-points in a pack — normalizer (per-row transformation), validator (batch report), evaluator (metrics enrichment). Selected from a built-in catalog by ID; each carries an optional config dict.
Registry gates: The promotion thresholds the pack imposes for training → staging → production. Independent from eval-pack gates; both must pass for promotion.
Default profile: The domain profile a pack adopts when assigned with adopt_pack_default_profile=true. Most bespoke packs reuse generic-domain-v1.
Pack assignment: The act of pointing a project at a pack — sets the project's domain_pack_id foreign key and optionally adopts the default profile. One-way from the UI; there is no unassign endpoint, only reassign.
Three-way match: The AP control reconciling invoice ↔ purchase order ↔ goods receipt before payment. The single concept whose extraction failure mode drives the most direct mapping to a control failure; the finance pack's cleaning recipe and per-entity precision priorities both serve it.
Payment-blocking fields: The AP fields where extraction errors translate directly to either blocked payment or wrong payment — total_amount, currency, vendor_tax_id, invoice_number, invoice_date, due_date, PO_number, line quantities, line unit prices. The finance team's convention is a 0.95 precision floor on this priority group, encoded in the project's eval pack.

Check yourself

Answers are saved to this browser.

← All tutorials