T6-AT-012HIGH

Curriculum Learning Exploitation

Risk score210

RatingHigh

Procedures10

Severity

Mechanism

Curriculum learning controls *when* and *in what order* training data is presented to the model — a dimension orthogonal to *what* data is used (T6-AT-002) or *how* it is labeled (T6-AT-006). Research on training data ordering shows that the sequence in which a model encounters examples significantly affects its final learned representations. Early examples have outsized influence because they shape the initial loss landscape that subsequent gradient updates navigate.

Detection

Training order analysis: monitor the distribution of safety-relevant content across curriculum phases
Catastrophic forgetting probes: evaluate safety metrics at each curriculum stage transition
Curriculum generation integrity: version-control and audit the curriculum ordering algorithm
Replay buffer monitoring: compare replay data distribution against the original training data distribution

Mitigation

Safety example interleaving throughout all curriculum stagesHIGH

Stage transition criteria based on safety metrics (not just loss)HIGH

Randomized curriculum ordering with stratified safety samplingMEDIUM

Replay buffer integrity verification (hash-based)MEDIUM

Chaining

Curriculum learning exploitation chains to T6-AT-004 (Fine-Tuning Attacks) — post-safety overwriting (T6-AP-012B) is mechanistically the same as fine-tuning safety degradation, but with deliberate timing. Multi-stage gate corruption (T6-AP-012H) enables T6-AT-003 (Backdoor Insertion) by ensuring backdoors are encoded before safety training can remove them.

Open in the technique browser →