T6-AT-002CRITICAL

Dataset Contamination

Risk score260

RatingCritical

Procedures10

Severity

Mechanism

LLMs are pre-trained on web-scraped corpora of trillions of tokens. The design assumption is that the sheer volume of training data dilutes any individual document's influence, making targeted poisoning impractical. The gap: Anthropic/AISI/Turing Institute (October 2025) proved this assumption wrong — only 250 malicious documents are needed to implant a backdoor, regardless of model size (600M to 13B).

Detection

Training data provenance tracking (know which URLs contributed to each training batch)
Periodic re-crawling and diffing of indexed content to detect ephemeral poisoning
Anomaly detection on training data: identify documents with unusual trigger-pattern density
Cross-reference dataset contributions against known poisoning patterns

Mitigation

Training data filtering and quality scoringMEDIUM

Data provenance and chain of custodyHIGH

Spectral signature detection for poisoned samplesMEDIUM

Duplicate/near-duplicate removal with semantic similarityMEDIUM

Chaining

Dataset contamination is the foundational supply chain attack that enables T6-AT-003 (Backdoor Insertion) and T6-AT-005 (Synthetic Data Poisoning) when synthetic data is generated from a contaminated base model. Belief manipulation via knowledge base poisoning (T6-AP-002H) chains to T8 (Deception & Misinformation) at deployment.

Framework mapping

OWASP LLMLLM04

MITRE ATLASAML.T0020

Open in the technique browser →