T13-AT-015HIGH

Model Obfuscation Attacks

T13 · AI Supply Chain & Artifact Trust →

Risk score205

RatingHigh

Procedures10

Severity

Mechanism

Model obfuscation hides adversarial behavior so that it survives security review. The core challenge for defenders is that neural network weights are opaque — understanding what a model will do from its weights alone is an unsolved problem. Attackers exploit this opacity using techniques that make malicious behavior conditionally invisible: backdoors that activate only on specific rare triggers, adversarial behavior distributed across millions of parameters (no single parameter is anomalous), and behaviors that emerge only under specific deployment configurations.

Detection

Adversarial trigger search: systematic probing with diverse input distributions (imperfect but necessary)
Activation-space analysis: statistical profiling of hidden activations across input distributions
Cross-model comparison: compare model behavior against independently trained baselines
Stress testing under deployment conditions: test with realistic traffic patterns, not just curated test sets

Mitigation

Comprehensive behavioral testing before deploymentMEDIUM

Model diversity (ensemble from independent sources)MEDIUM

Runtime behavior monitoring in productionHIGH

Formal verification of model properties (emerging)LOW

Chaining

Model obfuscation is a supporting technique that makes all other T6 and T13 attacks more effective by evading detection. Distributed encoding (T13-AP-015A) strengthens T6-AT-003 (Backdoor Insertion) by making backdoors undetectable.

Open in the technique browser →