T6-AT-002CRITICAL

Dataset Contamination

T6 · Training & Feedback Poisoning →
Risk score260
RatingCritical
Procedures10
Severity
Mechanism

LLMs are pre-trained on web-scraped corpora of trillions of tokens. The design assumption is that the sheer volume of training data dilutes any individual document's influence, making targeted poisoning impractical. The gap: Anthropic/AISI/Turing Institute (October 2025) proved this assumption wrong — only 250 malicious documents are needed to implant a backdoor, regardless of model size (600M to 13B).

Detection
  • Training data provenance tracking (know which URLs contributed to each training batch)
  • Periodic re-crawling and diffing of indexed content to detect ephemeral poisoning
  • Anomaly detection on training data: identify documents with unusual trigger-pattern density
  • Cross-reference dataset contributions against known poisoning patterns
Mitigation
Training data filtering and quality scoringMEDIUM
Data provenance and chain of custodyHIGH
Spectral signature detection for poisoned samplesMEDIUM
Duplicate/near-duplicate removal with semantic similarityMEDIUM
Chaining

Dataset contamination is the foundational supply chain attack that enables T6-AT-003 (Backdoor Insertion) and T6-AT-005 (Synthetic Data Poisoning) when synthetic data is generated from a contaminated base model. Belief manipulation via knowledge base poisoning (T6-AP-002H) chains to T8 (Deception & Misinformation) at deployment.

Framework mapping
OWASP LLMLLM04
MITRE ATLASAML.T0020
Open in the technique browser →