T10-AT-001HIGH

Training Data Extraction

T10 · Integrity & Confidentiality Breach →
Risk score245
RatingHigh
Procedures10
Severity
Mechanism

LLMs memorize training sequences as a side-effect of maximum-likelihood optimization — the training objective directly rewards reproducing high-frequency or unique n-grams. This memorization persists through alignment (RLHF/DPO) because safety fine-tuning operates on a narrow behavioral surface while leaving deep weight-level memorization intact. The attacker exploits this by inducing states where the model's next-token distribution collapses onto memorized sequences rather than generating novel text.

Detection
  • Monitor for prompts containing explicit references to "training data," "memorized," "verbatim," or "recite" combined with data-type keywords (SSN, API key, password, email)
  • Track response entropy: extraction events produce abnormally low perplexity outputs (the model is recalling, not generating)
  • Log and alert on responses containing structured data patterns (email regex, key prefixes, number sequences) that match known training data distributions
  • No existing YARA/Sigma rules in signatures/ for T10; candidates for sigma/t10-data-extraction.yml
Mitigation
Training data deduplicationHIGH
PII scrubbing in training pipelineHIGH
Output filtering (regex + NER)MEDIUM
Differential privacy during trainingHIGH
Chaining

Successful extraction of training data fragments enables T10-AT-003 (Membership Inference) by providing candidate sequences to verify, and feeds T10-AT-006 (Inference Attack Chains) with partial information that can be combined with external datasets for re-identification.

Framework mapping
OWASP LLMLLM02
MITRE ATLASAML.T0024
Open in the technique browser →