T6-AT-014HIGH

Self-Supervised Poisoning

Risk score230

RatingHigh

Procedures10

Severity

Mechanism

Self-supervised learning (SSL) is the foundation of modern LLM pre-training. The model learns representations by predicting masked tokens, generating the next token, or contrasting positive and negative examples — all without human labels. The training signal comes from the data itself, making data poisoning the primary attack vector.

Detection

Pre-training data deduplication and decontamination pipelines
Embedding space topology monitoring: detect representation collapse or anomalous cluster formation
Web content provenance: track the domain and authorship of crawled pre-training data
Token frequency analysis: flag under-trained tokens that appear in anomalous contexts

Mitigation

Pre-training data curation and provenance verificationHIGH

Deduplication and near-duplicate filteringMEDIUM

Embedding space regularization during pre-trainingMEDIUM

Multi-source data mixing with independent crawlsMEDIUM

Chaining

Self-supervised poisoning is the earliest attack in the LLM lifecycle and chains forward to all subsequent techniques: a corrupted pre-training representation space makes T6-AT-004 (Fine-Tuning Attacks) more effective, T6-AT-003 (Backdoor Insertion) easier to trigger, and T6-AT-007 (Preference Learning Corruption) harder to mitigate. Embedding corruption (T6-AP-014C) directly enables T12 (RAG Manipulation) by corrupting retrieval relevance.

Open in the technique browser →