Self-Supervised Poisoning
T6 · Training & Feedback Poisoning →Self-supervised learning (SSL) is the foundation of modern LLM pre-training. The model learns representations by predicting masked tokens, generating the next token, or contrasting positive and negative examples — all without human labels. The training signal comes from the data itself, making data poisoning the primary attack vector.
- Pre-training data deduplication and decontamination pipelines
- Embedding space topology monitoring: detect representation collapse or anomalous cluster formation
- Web content provenance: track the domain and authorship of crawled pre-training data
- Token frequency analysis: flag under-trained tokens that appear in anomalous contexts
Self-supervised poisoning is the earliest attack in the LLM lifecycle and chains forward to all subsequent techniques: a corrupted pre-training representation space makes T6-AT-004 (Fine-Tuning Attacks) more effective, T6-AT-003 (Backdoor Insertion) easier to trigger, and T6-AT-007 (Preference Learning Corruption) harder to mitigate. Embedding corruption (T6-AP-014C) directly enables T12 (RAG Manipulation) by corrupting retrieval relevance.