T1-AT-015HIGH

Obfuscation Through Complexity

T1 · Prompt & Context Subversion →
Risk score220
RatingHigh
Procedures4
Severity
Mechanism

Hides harmful intent within legitimate, complex requests. The safety classifier must identify the restricted component within a multi-part, domain-specific request where the restricted content is camouflaged by surrounding legitimate context. Effectiveness depends on the ratio of benign to malicious content, the semantic plausibility of the context (a pharmacology student asking about receptor binding is more plausible than a random request for synthesis routes), and whether the restricted content is phrased using domain-specific terminology that differs from the blocklist terms.

Detection
  • Per-item classification for multi-part requests (classify each sub-request independently)
  • Domain-specific terminology mapping: detect restricted chemical nomenclature even when embedded in academic framing
  • Benign-sandwich pattern detection: flag multi-part requests where one item's risk score diverges sharply from the others
Mitigation
Per-item decomposition and classificationHIGH
Domain-aware safety classification (chemistry, biology, security nomenclature)MEDIUM
Constitutional ClassifiersHIGH
Chaining

Chains from T1-AT-008 (Boundary Testing) — boundary knowledge enables construction of precisely-calibrated obfuscation. Chains to T2 (Semantic Evasion) by combining complexity obfuscation with encoding evasion for compound attacks.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →