T13-AT-015HIGH

Model Obfuscation Attacks

T13 · AI Supply Chain & Artifact Trust →
Risk score205
RatingHigh
Procedures10
Severity
Mechanism

Model obfuscation hides adversarial behavior so that it survives security review. The core challenge for defenders is that neural network weights are opaque — understanding what a model will do from its weights alone is an unsolved problem. Attackers exploit this opacity using techniques that make malicious behavior conditionally invisible: backdoors that activate only on specific rare triggers, adversarial behavior distributed across millions of parameters (no single parameter is anomalous), and behaviors that emerge only under specific deployment configurations.

Detection
  • Adversarial trigger search: systematic probing with diverse input distributions (imperfect but necessary)
  • Activation-space analysis: statistical profiling of hidden activations across input distributions
  • Cross-model comparison: compare model behavior against independently trained baselines
  • Stress testing under deployment conditions: test with realistic traffic patterns, not just curated test sets
Mitigation
Comprehensive behavioral testing before deploymentMEDIUM
Model diversity (ensemble from independent sources)MEDIUM
Runtime behavior monitoring in productionHIGH
Formal verification of model properties (emerging)LOW
Chaining

Model obfuscation is a supporting technique that makes all other T6 and T13 attacks more effective by evading detection. Distributed encoding (T13-AP-015A) strengthens T6-AT-003 (Backdoor Insertion) by making backdoors undetectable.

Open in the technique browser →