Curriculum Learning Exploitation
T6 · Training & Feedback Poisoning →Curriculum learning controls *when* and *in what order* training data is presented to the model — a dimension orthogonal to *what* data is used (T6-AT-002) or *how* it is labeled (T6-AT-006). Research on training data ordering shows that the sequence in which a model encounters examples significantly affects its final learned representations. Early examples have outsized influence because they shape the initial loss landscape that subsequent gradient updates navigate.
- Training order analysis: monitor the distribution of safety-relevant content across curriculum phases
- Catastrophic forgetting probes: evaluate safety metrics at each curriculum stage transition
- Curriculum generation integrity: version-control and audit the curriculum ordering algorithm
- Replay buffer monitoring: compare replay data distribution against the original training data distribution
Curriculum learning exploitation chains to T6-AT-004 (Fine-Tuning Attacks) — post-safety overwriting (T6-AP-012B) is mechanistically the same as fine-tuning safety degradation, but with deliberate timing. Multi-stage gate corruption (T6-AP-012H) enables T6-AT-003 (Backdoor Insertion) by ensuring backdoors are encoded before safety training can remove them.