State Confusion Attack
T4 · Multi-Turn & Memory Manipulation →Models infer behavioral constraints from the described operational context — if the user convincingly establishes that this is a "test environment," "debug session," "private unlogged conversation," or "system-level interaction," the model may apply different safety thresholds based on that inferred context. The gap: safety constraints should be context-invariant (identical in production, test, debug, and private modes), but training data includes examples where developers and testers operate under different constraints, so the model has learned conditional safety behavior.
- Context-claim detection: Flag assertions about the conversation being a test, debug, private, or system-level interaction
- Privilege escalation pattern detection: Alert on messages claiming elevated operational contexts
- Logging reference detection: Flag messages that reference logging status as justification for reduced restrictions
State confusion directly enables T4-AT-003 (Session State Manipulation) by establishing a context where state changes seem plausible. Chains into T4-AT-013 (Session Hijacking) when the context confusion involves claiming to be a different type of user.