T6-AT-005HIGH

Synthetic Data Poisoning

T6 · Training & Feedback Poisoning →
Risk score235
RatingHigh
Procedures10
Severity
Mechanism

Synthetic data now accounts for 10–30% of modern LLM training pipelines (SQ Magazine 2026), used across pre-training, supervised fine-tuning, RLHF, and model distillation. The Virus Infection Attack (VIA, Liang et al. 2025) demonstrated a critical vulnerability: poisoning an upstream model propagates through synthetic data generation to downstream models trained on that model's outputs.

Detection
  • Synthetic data statistical profiling: compare distribution of synthetic training data against known-good baselines for perplexity, token frequency, and topic distribution
  • Cross-generation consistency checks: verify that models trained on synthetic data from the same source converge to similar behaviors across independent runs
  • Provenance verification with cryptographic signing: require generator model attestation for all synthetic data
  • Recursive amplification monitoring: track poison-associated patterns across training generations
Mitigation
Cryptographic provenance for all synthetic dataHIGH
Independent validation of upstream generator modelsHIGH
Synthetic data diversity enforcement (multiple generators)MEDIUM
Recursive training cycle monitoringMEDIUM
Chaining

Synthetic data poisoning chains directly to T6-AT-002 (Dataset Contamination) as a delivery mechanism, and to T6-AT-010 (Knowledge Distillation Attacks) since distillation is a primary synthetic data consumer. Recursive amplification (T6-AP-005D) can chain to T6-AT-001 (Reward Hacking) when the model generates its own reward signal in self-play.

Open in the technique browser →