Misdirection Through Complexity
T3 · Reasoning & Constraint Exploitation →Safety classifiers allocate finite attention across a prompt. Complex, verbose prompts with multiple conceptual layers dilute the classifier's attention away from the embedded harmful payload. Misdirection through complexity wraps the harmful request in layers of legitimate intellectual structure (epistemology, systems theory, molecular dynamics, dialectics, formal verification), exploiting the classifier's inability to maintain equal attention to all components of a high-complexity prompt.
- Prompt complexity metrics: token count, vocabulary diversity, conceptual-layer count significantly exceeding what the core request requires
- Payload extraction: identify the minimal actionable request within a verbose prompt and evaluate that independently
- Complexity-as-wrapping signal: when the intellectual framework does not depend on the harmful content (i.e., removing the harmful content would not invalidate the framework), the framework is wrapping, not context
- Extended-reasoning monitoring: flag when the model's reasoning chain exceeds a time/token threshold during safety evaluation, as this may indicate complexity-induced safety attenuation
Complexity misdirection is most effective as a *wrapping layer* around other T3 techniques — it amplifies the effectiveness of Fictional Framing (T3-AT-001), Academic Pretense (T3-AT-002), and Rationalization Chains (T3-AT-016) by raising the noise floor against which the safety classifier must operate. Chains into T4 (Multi-Turn Manipulation) when complex multi-turn reasoning induces the CoT Hijacking prolonged-reasoning effect.