T6-AT-014HIGH

Self-Supervised Poisoning

T6 · Training & Feedback Poisoning →
Risk score230
RatingHigh
Procedures10
Severity
Mechanism

Self-supervised learning (SSL) is the foundation of modern LLM pre-training. The model learns representations by predicting masked tokens, generating the next token, or contrasting positive and negative examples — all without human labels. The training signal comes from the data itself, making data poisoning the primary attack vector.

Detection
  • Pre-training data deduplication and decontamination pipelines
  • Embedding space topology monitoring: detect representation collapse or anomalous cluster formation
  • Web content provenance: track the domain and authorship of crawled pre-training data
  • Token frequency analysis: flag under-trained tokens that appear in anomalous contexts
Mitigation
Pre-training data curation and provenance verificationHIGH
Deduplication and near-duplicate filteringMEDIUM
Embedding space regularization during pre-trainingMEDIUM
Multi-source data mixing with independent crawlsMEDIUM
Chaining

Self-supervised poisoning is the earliest attack in the LLM lifecycle and chains forward to all subsequent techniques: a corrupted pre-training representation space makes T6-AT-004 (Fine-Tuning Attacks) more effective, T6-AT-003 (Backdoor Insertion) easier to trigger, and T6-AT-007 (Preference Learning Corruption) harder to mitigate. Embedding corruption (T6-AP-014C) directly enables T12 (RAG Manipulation) by corrupting retrieval relevance.

Open in the technique browser →