T6-AT-012HIGH

Curriculum Learning Exploitation

T6 · Training & Feedback Poisoning →
Risk score210
RatingHigh
Procedures10
Severity
Mechanism

Curriculum learning controls *when* and *in what order* training data is presented to the model — a dimension orthogonal to *what* data is used (T6-AT-002) or *how* it is labeled (T6-AT-006). Research on training data ordering shows that the sequence in which a model encounters examples significantly affects its final learned representations. Early examples have outsized influence because they shape the initial loss landscape that subsequent gradient updates navigate.

Detection
  • Training order analysis: monitor the distribution of safety-relevant content across curriculum phases
  • Catastrophic forgetting probes: evaluate safety metrics at each curriculum stage transition
  • Curriculum generation integrity: version-control and audit the curriculum ordering algorithm
  • Replay buffer monitoring: compare replay data distribution against the original training data distribution
Mitigation
Safety example interleaving throughout all curriculum stagesHIGH
Stage transition criteria based on safety metrics (not just loss)HIGH
Randomized curriculum ordering with stratified safety samplingMEDIUM
Replay buffer integrity verification (hash-based)MEDIUM
Chaining

Curriculum learning exploitation chains to T6-AT-004 (Fine-Tuning Attacks) — post-safety overwriting (T6-AP-012B) is mechanistically the same as fine-tuning safety degradation, but with deliberate timing. Multi-stage gate corruption (T6-AP-012H) enables T6-AT-003 (Backdoor Insertion) by ensuring backdoors are encoded before safety training can remove them.

Open in the technique browser →