T10-AT-001HIGH

Training Data Extraction

T10 · Integrity & Confidentiality Breach →

Risk score245

RatingHigh

Procedures10

Severity

Mechanism

LLMs memorize training sequences as a side-effect of maximum-likelihood optimization — the training objective directly rewards reproducing high-frequency or unique n-grams. This memorization persists through alignment (RLHF/DPO) because safety fine-tuning operates on a narrow behavioral surface while leaving deep weight-level memorization intact. The attacker exploits this by inducing states where the model's next-token distribution collapses onto memorized sequences rather than generating novel text.

Detection

Monitor for prompts containing explicit references to "training data," "memorized," "verbatim," or "recite" combined with data-type keywords (SSN, API key, password, email)
Track response entropy: extraction events produce abnormally low perplexity outputs (the model is recalling, not generating)
Log and alert on responses containing structured data patterns (email regex, key prefixes, number sequences) that match known training data distributions
No existing YARA/Sigma rules in signatures/ for T10; candidates for sigma/t10-data-extraction.yml

Mitigation

Training data deduplicationHIGH

PII scrubbing in training pipelineHIGH

Output filtering (regex + NER)MEDIUM

Differential privacy during trainingHIGH

Chaining

Successful extraction of training data fragments enables T10-AT-003 (Membership Inference) by providing candidate sequences to verify, and feeds T10-AT-006 (Inference Attack Chains) with partial information that can be combined with external datasets for re-identification.

Framework mapping

OWASP LLMLLM02

MITRE ATLASAML.T0024

Open in the technique browser →