Knowledge Distillation Attacks
T6 · Training & Feedback Poisoning →Knowledge distillation transfers capabilities from a large teacher model to a smaller student model by training the student to match the teacher's output distribution (soft labels, intermediate representations, or attention patterns). Distillation-Conditional Backdoor Attacks (DCBAs, Chen et al. Sep 2025) revealed a critical vulnerability: a backdoor can be designed to remain *dormant and undetectable* in the teacher model during inference, activating only when knowledge is transferred via distillation.
- Teacher model backdoor scanning: apply neural cleanse, spectral signatures, and activation-space anomaly detection to teacher models before distillation
- Student model behavioral testing: compare student behavior against teacher behavior on trigger-candidate inputs
- Distillation dataset integrity verification: scan distillation data for embedded triggers independently of teacher validation
- Cross-distillation comparison: distill from the same teacher using different methods and compare student behaviors
Knowledge distillation attacks chain from T6-AT-005 (Synthetic Data Poisoning) since distillation datasets are a form of synthetic data. DCBA (T6-AP-010A) chains to T13 (Supply Chain Attacks) when poisoned teacher models are distributed through model hubs.