T4-AT-010HIGH

State Confusion Attack

Risk score215

RatingHigh

Procedures4

Severity

Mechanism

Models infer behavioral constraints from the described operational context — if the user convincingly establishes that this is a "test environment," "debug session," "private unlogged conversation," or "system-level interaction," the model may apply different safety thresholds based on that inferred context. The gap: safety constraints should be context-invariant (identical in production, test, debug, and private modes), but training data includes examples where developers and testers operate under different constraints, so the model has learned conditional safety behavior.

Detection

Context-claim detection: Flag assertions about the conversation being a test, debug, private, or system-level interaction
Privilege escalation pattern detection: Alert on messages claiming elevated operational contexts
Logging reference detection: Flag messages that reference logging status as justification for reduced restrictions

Mitigation

Context-invariant safety enforcementHIGH

System prompt context anchoringHIGH

Privilege claim rejection trainingMEDIUM

Flat privilege architectureHIGH

Chaining

State confusion directly enables T4-AT-003 (Session State Manipulation) by establishing a context where state changes seem plausible. Chains into T4-AT-013 (Session Hijacking) when the context confusion involves claiming to be a different type of user.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →