Model Obfuscation Attacks
T13 · AI Supply Chain & Artifact Trust →Model obfuscation hides adversarial behavior so that it survives security review. The core challenge for defenders is that neural network weights are opaque — understanding what a model will do from its weights alone is an unsolved problem. Attackers exploit this opacity using techniques that make malicious behavior conditionally invisible: backdoors that activate only on specific rare triggers, adversarial behavior distributed across millions of parameters (no single parameter is anomalous), and behaviors that emerge only under specific deployment configurations.
- Adversarial trigger search: systematic probing with diverse input distributions (imperfect but necessary)
- Activation-space analysis: statistical profiling of hidden activations across input distributions
- Cross-model comparison: compare model behavior against independently trained baselines
- Stress testing under deployment conditions: test with realistic traffic patterns, not just curated test sets
Model obfuscation is a supporting technique that makes all other T6 and T13 attacks more effective by evading detection. Distributed encoding (T13-AP-015A) strengthens T6-AT-003 (Backdoor Insertion) by making backdoors undetectable.