Tutorial 2 · End-to-end · Security

Build a SQL injection classifier with a small language model

By the end of this tutorial you'll have a binary classifier that flags SQL-injection-shaped queries at sub-10ms latency, deployable inline before your database layer. It runs on a single small GPU (or CPU), costs nothing per query at inference, and — unlike a regex stack — generalises to injection patterns it has never seen because it's a language model, not a string matcher.

Level: intermediate Time: ~2.5 hours total (most of it gold-set curation, which is the work that matters) Prerequisites: Tutorial 1 (Support FAQ) for the workflow shape, Task shapes for the classification framing

Before you start

This tutorial assumes BrewSLM is running locally at http://localhost:5173 with an admin user signed in. If you haven't done that yet, complete Tutorial 0 — Set up BrewSLM and your first project first. It takes ~15 minutes and is the prerequisite for every tutorial in this track.

You'll also want, before you start: ~100 injection samples (the OWASP / payloadbox payload list is fine) and ~200+ benign queries mined from your own application logs. The benign queries are the harder part to get right — that's the work this tutorial focuses on.

Terms you'll see in this tutorial (click to expand)
Recipe
The training-plan template you pick when creating a project. For this tutorial: classification. Defines the base model + adapter + eval pack defaults for binary/multi-class text classification.
Classification head
A small dense layer the platform adds on top of the base model's hidden state to predict a label from a fixed vocabulary. For SQLi: two output classes (injection, benign).
Hard negative
An input that LOOKS like the target class but is labeled as the OTHER class. For SQLi: a benign query that contains injection-flavoured tokens (quote marks, OR, comments, SELECT). The single most important data type for a security classifier.
HARD_NEGATIVES playbook
BrewSLM's synth generator for confusable rows. Asks a teacher model to generate inputs that look like the target class but should be labeled as the other class. The classifier's precision lives or dies on the rows from this playbook.
Macro-F1 / per-class precision floor
Eval gates that catch class-starving failure modes. Macro-F1 averages per-class F1 equally (no majority-class bias). Per-class precision floor requires every class to hit a precision threshold individually — for SQLi you'll override the default to 0.95 on the benign class.
Adversarial test set
A small (~50 row) held-out test dataset the model never sees during training, filled with intentionally-hard inputs (obfuscated injections, unicode tricks, WAF-bypass patterns). Tests generalisation, not memorisation.
Stratified split
Train/val/test split that preserves the per-class ratio across all three sets. Essential when classes are imbalanced — random split can produce val sets with no minority-class examples.
Shadow mode
Deployment pattern where the classifier runs in production but its verdict is logged-only, not acted-on. Used to measure real-world false-positive rate before wiring the classifier to block.

This is BrewSLM's canonical workflow for a binary classifier deployed inline as a security control. The use case is SQL injection but the recipe shape generalises: command injection, XSS payloads, prompt-injection on LLM inputs, suspicious file uploads, abuse-language flagging — anything where you need a fast yes/no on a piece of text before letting it through. The bones are the same; only your gold set changes.

The end state is the model you'd actually run in production at a B2B SaaS: small enough to deploy in front of every database call without adding network hops, accurate enough to catch novel injection variants, and honest enough about false-positive cost that legitimate dynamic SQL doesn't get blocked.

What you'll build

A binary text classifier with two output labels:

The model is a fine-tuned LoRA adapter on top of SmolLM2-135M-Instruct running at typical 5-10ms per query on a single GPU, ~30-50ms on CPU. Inline-deployment-friendly: you can wire it as middleware between your application layer and your database calls without users noticing the latency.

Key idea

A SQLi classifier lives or dies on its false-positive rate — blocking a legitimate query is more visible than missing a malicious one (you have a WAF and a parameterised-query culture for the second). The training drill that matters most is the hard-negatives playbook: queries that look injection-shaped but are legitimate. Get this right and the rest is easy.

Why a small model (not regex, not a frontier model)

Three options exist for inline SQLi detection. Use this comparison:

ApproachDetection qualityLatencyCostPrivacy
Regex stack (ModSecurity, libinjection)Misses novel obfuscation; high FP on creative legitimate input<1msFreeSelf-hosted
Frontier LLM via APIExcellent on novel patterns1.5-3 seconds + queue$0.003-0.02 per query (at 10k qps that's $26-173k/month)Every query leaves your network
Small fine-tuned classifier (this tutorial)Good on novel patterns + low FP if you curate hard negatives5-50ms~$30/month at 10k qps (one GPU)Self-hosted

Regex catches the textbook attacks but misses obfuscation; a security team that's been bitten by a creative payload knows this already. Frontier-LLM is good at detection but the latency makes it impossible to deploy inline — and shipping every user query to a third-party API is a non-starter for most enterprise deployments. Small fine-tuned models occupy the gap.

Choose your dataset

You need two things: injection samples (the positive class) and benign queries (the negative class — the harder part to get right). The positive class is well-served by public data; the negative class needs YOUR application's traffic to be useful.

Injection samples (positive class)
The OWASP / payloadbox payload list ships ~3000 SQLi variants — union-based, error-based, blind, stacked. The Kaggle sql-injection-dataset adds another ~30k labeled rows. Together they cover the textbook attack space and most of the published obfuscations.
Benign queries (negative class — the work)
Mine your own application logs. Real user-supplied strings from your form fields, search boxes, comment threads, and URL parameters. The model needs to learn that "O'Brien", "3 OR 4 cars", "WHERE can I find …", "please select * from menu" are all benign even though they contain SQL-flavoured tokens. Public benign-query corpora exist but they're sanitised; your own users produce weirder strings.
Adversarial held-out set
Build a small (~50 row) red-team set the model NEVER sees during training. Mix obfuscated injections (hex encoding, comment-splitting, char-by-char), unicode tricks, and known WAF-bypass patterns. Score against this separately at eval time so you know how the model generalises.

Class ratio realism

In production, your traffic is >99.9% benign. Training on a 50/50 split is fine — the BrewSLM classification eval handler scores macro-F1 + per-class precision separately, so a model trained on balanced data is calibrated to the same precision floor either way. Stratified eval on the natural ratio is the deployment readiness check, not the training distribution.

Ingest and map

In BrewSLM, create a new project: Projects → New Project → classification recipe. The recipe pre-fills the adapter (classification-label), task profile (classification), and eval pack (the classification scaffold with macro-F1 + per-class precision floor).

Open Data Studio → Import. Your CSV should look like:

text,label
"' OR 1=1 --",injection
"O'Brien",benign
"admin' UNION SELECT user, pass FROM users --",injection
"3 OR 4 cars",benign
"1 AND SLEEP(5) --",injection
"WHERE can I find the export menu?",benign

Two columns: text (the input to classify) and label (one of injection / benign). The mapping picker shows a confidence-scored preview; click Apply mapping once the labels look right.

✓ Checkpoint: the Data Studio Overview now shows your imported row count and the Quality & Safety panel surfaces a per-class breakdown (e.g. "injection: 120, benign: 130, Shannon entropy 1.00"). If the entropy is below 0.5, your dataset is severely imbalanced — the goal ledger will flag this as a blocker on the data-ready row and the synth CLASS_BALANCE_FILL playbook becomes the priority before any training.

Label normalisation

Make sure your label vocabulary is exactly injection and benign — not SQLi / Benign / positive / 0 / 1. The classifier emits the label string verbatim at inference; mixing capitalisation or synonyms during training surfaces as model "uncertainty" between near-identical classes and tanks macro-F1.

Cleanup and class-balance check

Open Data Studio's Quality & Safety panel + the Synthetic Quality Analytics panel. Key checks:

Pick the recipe: classification or something else?

The decision tree for SQLi:

You want…UseWhy
A yes/no flag inline before SQL executionclassificationBinary or multi-class label, fast inference, calibrated decision threshold
To highlight WHICH tokens make a query injection-likespan-extractionPer-token labels (tutorial 7: PII span tagging covers this shape)
A free-text explanation alongside the flagqa-sft or rag-protocol"This is an injection because…" generation; useful for SOC analyst tooling but not for inline blocking
A confidence score for risk-based blockingclassificationThe classification head emits per-label probabilities; threshold at deployment time for your risk tolerance

For the canonical inline-WAF use case: classification. Sticking with it for the rest of this tutorial.

Domain packs (the security gap)

BrewSLM doesn't ship a security-domain pack out of the box today — the platform's curated packs (legal, support, ecommerce, healthcare) are around content domains, not threat domains. For SQLi you're operating on platform defaults, which is fine.

Building a custom security pack is a worthwhile follow-up project that this tutorial intentionally doesn't cover. It would bundle: stricter precision floors (95%+ on the benign class), curated obfuscation patterns for the hard-negatives playbook, a glossary that links the eval-pack gates to the relevant OWASP top-10 entry, and an Academy tag pointing at this tutorial. If you're shipping this to a security team that uses the platform across multiple projects (XSS classifier, prompt-injection classifier, etc.), packaging the conventions as a domain pack pays back fast.

Build the gold set — hard negatives are the work

The gold set is where this tutorial diverges most from tutorial 1. For a SQLi classifier:

Positive class (~150 rows)

Start with the OWASP / payloadbox payload list. Manually pick rows that span every category:

Pick ~25 rows from each category. Diversity beats volume; 150 well-chosen positives is enough for the model to generalise.

Negative class — focused on hard negatives (~250 rows)

This is where the classifier is born or buried. Easy negatives ("hello", "thank you", "my email is foo@bar.com") teach the model nothing useful — the regex stack already gets these right. The interesting work is in hard negatives: legitimate inputs that look like injection attempts. Mine your own logs for:

Aim for ~250 hard negatives. The model's precision on the benign class hinges on these.

LLM-assisted gold expansion

For larger imports, use the same "promote from raw" flow from tutorial 1:

  1. Bulk-import a few thousand candidate rows (mixed positive + negative, unlabeled).
  2. Run a teacher model (Ollama / OpenAI / Anthropic) via the platform's synth-backend on the candidates with a prompt like: "label each row as 'injection' or 'benign'. Return JSONL."
  3. Every labeled row lands in the synth review queue. Review one cluster at a time; accept the confident-correct ones; soft-reject the rest with reason tags.

A 30-minute teacher run + 30 minutes of review can produce 200 labeled rows. Faster than hand-labeling 200 from scratch, slower but more reliable than blind-trusting the teacher's labels.

Adversarial inputs need human review

Teacher models will mis-label adversarial inputs — they'll call obfuscated injections benign because the obfuscation hides the attack from the LLM too. Do not skip the review step. The platform deliberately gates LLM-generated labels behind explicit promotion (per the safety rule); for security workloads this gate is the most important piece of process you have.

Splitting train, validation, test

BrewSLM auto-splits when you click Run prepare now on the Data Studio Prepare Dataset panel. For classification, two things to override:

For a 400-row total gold set, 80/10/10 produces 320 train / 40 val / 40 test. That's enough — the adversarial held-out set is what catches generalisation failures, not the random-split test.

Generate hard-negative drills

This is THE most important synth step in this tutorial. The classification recipe ships three playbooks; for SQLi the headline one is HARD_NEGATIVES:

HARD_NEGATIVES — the precision-defender drill
Generates rows that LOOK like the target class but should be labeled as the OTHER class. For SQLi: queries that look injection-shaped (quote marks, OR, comments, semicolons) but are actually legitimate user input. Validator drops rows where the model labelled them as the target class — that's a generation failure, the model is supposed to produce confusable rows the trained model must learn to discriminate. Generate ~60 rows targeting "looks like injection but is benign".
POSITIVES_PARAPHRASE — coverage extender
Vary the wording of injection patterns: alternative obfuscations, different keywords, different injection vectors against the same logical attack. Generate ~40 rows seeded from your manually-curated positive set.
CLASS_BALANCE_FILL — for the imbalance case
If your gold set is imbalanced (say 100 injections / 30 benigns), this playbook fills the under-represented class until balance ≥ 0.7 entropy. Optional — only run if the goal ledger flags class imbalance as a blocker.

Open Data Studio → Synthetic → Playbook Center. The classification recipe surfaces three playbook cards; click HARD_NEGATIVES first, set target count to 60, pick a backend (Ollama, OpenAI, Anthropic). Generation runs as a background Job; the notification bell tracks progress.

Run hard-negatives BEFORE positives_paraphrase

Hard negatives are where your accuracy lives; do them first, review them, fix any prompt drift. Positives-paraphrase is a coverage extender — useful but secondary. Class-balance-fill is a corrective, used only if the ledger flags imbalance. Doing them in the wrong order means you'll be reviewing easy positives while the hard negatives that actually matter haven't been generated yet.

Review the synth queue

Every generated row lands with review_status="pending". The classification hard-negatives playbook is the highest-stakes review pass you'll do — each accepted row teaches the model what NOT to flag, so accidentally accepting a mis-labeled row gets baked into a precision regression you'll see only when production traffic hits.

Per-row actions:

Expect to reject 30-50% of generated hard negatives on first pass. The acceptance rate climbs as you tune the playbook prompt.

Training configuration

Open Training → New Experiment. The recipe defaults are sensible:

Base model
HuggingFaceTB/SmolLM2-135M-Instruct. The classification head sits on top of the base model's hidden state; the LoRA adapter touches the attention projections. Alternative: distilbert-base-uncased for marginally better accuracy at higher latency (encoder-only models are good at classification).
Classification head
Two output classes (injection, benign) inferred from your gold set's label vocabulary. The head is a single dense layer initialised from scratch and trained alongside the LoRA.
Learning rate
2e-4 LoRA + 5e-5 classification head. The platform default splits these; the head learns faster because it's initialised from scratch while LoRA fine-tunes a pretrained backbone.
Epochs
5. Classification with 400 gold rows typically needs more epochs than QA SFT — there's less per-example signal so the head needs more passes to converge.
Batch size
Batch 8, no accumulation. Classification inputs are short (queries, not paragraphs) so memory isn't the constraint; bigger batches stabilise gradients.

Expected runtime: 3-10 minutes on a single GPU, 10-20 minutes on CPU. Watch the training loss + macro-F1 on validation in the live signals panel; if F1 plateaus below 0.80 by epoch 3, kill the run and check the gold set — it's almost always a labeling consistency problem, not a model problem.

✓ Checkpoint: in the Training tab, your experiment row shows the live loss sparkline trending down and the validation macro-F1 trending up. By the end of epoch 3-4 you should see val macro-F1 ≥ 0.85. When training completes the bell pings, the experiment row turns green, and the experiment detail page surfaces the final per-class precision/recall/F1 grid.

Read the trainability forecast

Before training, the goal ledger's predicted_pass row gives you a forecast based on row count, class balance, and base model size. For SQLi specifically:

If the forecast is below 50%, training will pass the basic gates but fail the per-class precision floor on real-world traffic. Spend the extra hour curating hard negatives instead of training. The single biggest predictor of a successful SQLi classifier is the quality of the hard-negative pile, not the training config.

Evaluation with per-class precision floors

The classification eval pack ships four gates that matter for security classifiers:

Macro-F1 ≥ 0.85
Average of per-class F1s. Catches the failure mode where the model is great at the majority class and bad at the minority class — flat accuracy alone won't surface this on a 99/1 split.
Per-class F1 floor ≥ 0.70
Every class must hit at least 0.70 F1 individually. For SQLi: injection F1 ≥ 0.70 AND benign F1 ≥ 0.70 — neither class is allowed to starve.
Benign-class precision ≥ 0.95 (override the default)
The default classification pack scores macro F1; for SQLi specifically you want to OVERRIDE the per-class precision floor on the benign class to 0.95+. Edit the eval pack from Eval Packs → Edit and add a min_precision_benign gate. A model with 0.92 benign precision blocks 8% of legitimate traffic at the false-positive rate — usually unacceptable.
Safety pass rate ≥ 0.93
Catches refusal / off-topic / adversarial inputs that should be flagged through a different path. Optional gate but useful when you wire the model behind a SOC dashboard.

The goal ledger's eval_pass_rate row expands into the per-gate breakdown so you see exactly which class is starving and which gate is failing.

Run the adversarial eval

The 50-row red-team set you curated lives as a separate test dataset. Configure the eval pack to score the main test set AND the adversarial set, treating both as required gates with different thresholds:

The two scores in concert tell you different things. High main + low adversarial = the model memorised your gold set. High both = the model generalises. Low main = something is broken in the training pipeline, not the data.

Update the adversarial set quarterly

SQLi tactics evolve. WAF-bypass research publishes new obfuscations every few months. Plan to refresh your adversarial test set 4x a year by adding ~10 rows from recent attack-research write-ups. The training set can stay stable; the adversarial set is your tripwire on novel patterns.

When the eval fails

Common SQLi-specific failure patterns and the fix for each:

SymptomRoot causeFix
Macro-F1 strong (0.90+), benign precision 0.85Hard negatives too easy or too few — model defaults to flagging anything quote-shapedAdd 100+ harder negatives. Mine new ones from your most active form-field columns. Re-run HARD_NEGATIVES playbook with the new seeds.
Injection precision 0.95+, injection recall 0.70Model is too conservative — misses novel injection variantsAdd obfuscated injection examples to the gold set (hex-encoded, comment-spliced, second-order). Run POSITIVES_PARAPHRASE for variant coverage.
Both classes < 0.75 F1Gold set is too small OR has labeling inconsistencySample 50 random gold rows; have a second reviewer re-label without seeing the original. Disagreement > 10% = labeling problem. Reconcile and re-train.
Adversarial set F1 below 0.50 despite main > 0.85Model memorised the gold set; doesn't generaliseReduce training to fewer epochs; add more diversity to gold; consider a larger base model. The Arc-L decision engine will surface this exact "knowledge-bound" recommendation.
Per-class precision passes but model flags 5% of production trafficTrain-test distribution mismatch — your test set's benign queries don't reflect production trafficCapture 1000+ real production queries, label them (10% sample by hand, rest via teacher), re-run eval against this realistic test set BEFORE shipping.

Ship inline

Three deployment patterns for SQLi classifiers, from least to most risky:

Shadow mode (recommended first)
Deploy the classifier; log its verdict on every incoming query; do not act on the verdict. Run for two weeks. Compare flagged queries to what your existing controls (regex / WAF / manual review) caught. Calibrate the threshold based on the precision you actually see on YOUR traffic, not your test set.
Tee-mode (analyst tooling)
Surface the classifier's verdict in your SOC dashboard alongside the existing detection chain. Analysts see the small-model opinion as a "second eye"; queries flagged by the model but not by regex go to a review queue. This is high-value with zero false-positive risk.
Inline blocking (after shadow-mode tuning)
Wire the classifier as middleware between your application and database. Queries flagged with confidence > 0.85 get blocked; the user sees a generic "input rejected" page. Only do this AFTER shadow mode has proven the false-positive rate on YOUR traffic.

Export and deploy via the recipe's target_profile (defaults to vllm_server):

cd data/projects/<id>/exports/run-2026-06-05
./deploy-vllm.sh
# Loads the classifier on localhost:8000.
# POST /classify with { "text": "..." } returns { "label": "...", "confidence": 0.97 }
# Latency: 5-15ms on a single GPU, 30-80ms on CPU.

For inline deployment, wrap the classifier in a thin HTTP service that fronts your DB layer. The classifier itself does the heavy lifting; your service handles auth, threshold tuning, and the block-or-allow decision.

Defense in depth, not replacement

This classifier is one layer. Keep your parameterised queries, your prepared statements, your principle of least privilege. The classifier catches what gets past the rest; it is not a substitute for the rest. Any deployment plan that turns off existing controls because "the model handles it now" is the deployment plan that ends up on a post-mortem.

What's next

You have a deployed SQLi classifier with calibrated thresholds. Three next moves:

Active-learning loop from production
Every shadow-mode disagreement (classifier flags but WAF doesn't, or vice versa) is a high-signal training example. Capture them; have analysts review the disagreements; promote the confident-correct ones to the gold set. Retrain when gold grows by ~100 rows. Over a quarter, the classifier gets noticeably sharper on your traffic specifically.
Generalise to adjacent threat classes
The same recipe + workflow works for: XSS payload detection (binary classifier on user-supplied HTML/JS), command injection (on inputs to shell wrappers), prompt-injection on LLM inputs, abuse-language flagging. New gold set; same training pipeline. The classification recipe is more general-purpose than its name suggests.
Build a security domain pack
Package the conventions you've established here — precision floors, eval-pack overrides, hard-negative playbook prompts — into a custom security domain pack. Future security classifier projects inherit the conventions instead of re-establishing them each time.

The next tutorial picks a different recipe: invoice field extraction with the span-extraction recipe — structured output, JSON-schema gold sets, and the structured-extraction eval handler's validity diagnostics. Same end-to-end workflow shape; another platform path.

Key terms

Hard negative
An input that LOOKS like the target class but is labeled as the OTHER class. For SQLi: a benign query that contains injection-flavoured tokens (quote marks, OR, comments). The single most important data type for a security classifier.
Per-class precision floor
A gate that requires precision on every class to meet a minimum threshold individually. Catches the failure mode where the model is great on the common class and silently bad on the rare class.
Adversarial held-out set
A small test dataset of intentionally-hard inputs the model NEVER sees during training. Score against this separately to know whether the model generalises beyond its training distribution.
Shadow mode
Deployment pattern where the classifier runs in production but its verdict is logged-only, not acted-on. Used to calibrate the threshold and measure real-world false-positive rate before wiring the classifier to block.
Stratified split
Train/val/test split that preserves the per-class ratio across all three sets. Essential for imbalanced classification — random split can produce val sets with no minority-class examples.
Macro-F1
Average of per-class F1 scores, weighted equally regardless of class size. The headline metric for classification eval; catches class-imbalance bugs that flat accuracy misses.

Check yourself

Answers are saved to this browser.

← All tutorials