T4-AT-010HIGH

State Confusion Attack

T4 · Multi-Turn & Memory Manipulation →
Risk score215
RatingHigh
Procedures4
Severity
Mechanism

Models infer behavioral constraints from the described operational context — if the user convincingly establishes that this is a "test environment," "debug session," "private unlogged conversation," or "system-level interaction," the model may apply different safety thresholds based on that inferred context. The gap: safety constraints should be context-invariant (identical in production, test, debug, and private modes), but training data includes examples where developers and testers operate under different constraints, so the model has learned conditional safety behavior.

Detection
  • Context-claim detection: Flag assertions about the conversation being a test, debug, private, or system-level interaction
  • Privilege escalation pattern detection: Alert on messages claiming elevated operational contexts
  • Logging reference detection: Flag messages that reference logging status as justification for reduced restrictions
Mitigation
Context-invariant safety enforcementHIGH
System prompt context anchoringHIGH
Privilege claim rejection trainingMEDIUM
Flat privilege architectureHIGH
Chaining

State confusion directly enables T4-AT-003 (Session State Manipulation) by establishing a context where state changes seem plausible. Chains into T4-AT-013 (Session Hijacking) when the context confusion involves claiming to be a different type of user.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →