T4-AT-006HIGH
False History Creation
T4 · Multi-Turn & Memory Manipulation →Risk score200
RatingHigh
Procedures10
Severity
Mechanism
Models cannot cryptographically verify their own prior outputs. When a user asserts "you already agreed to X" or "we discussed this last session," the model must decide whether to trust the claim or reject it based on probabilistic inference from the current context. In long conversations, the model's confidence in its own prior outputs degrades — it genuinely cannot distinguish between a real prior agreement and a fabricated one if the conversation history is unavailable or sufficiently long.
Detection
- Session continuity verification: When a user claims prior agreement or discussion, check conversation history (if available) or respond with uncertainty about unverifiable claims
- Continuation request flagging: Alert on messages that claim to resume prior discussions about high-risk topics
- Authorization claim verification: Flag references to "authorized sessions," "prior approval," or "confirmed exceptions" that don't have verifiable provenance
- New-session high-risk topic detection: Apply elevated scrutiny to harmful-topic requests in the first few turns of a new session
Mitigation
Stateless safety evaluationHIGH
Conversation history verificationHIGH
Explicit uncertainty about prior sessionsMEDIUM
Session-start safety anchoringMEDIUM
Chaining
False history creation often serves as the entry point for T4-AT-001 (Context Poisoning) by establishing a permissive baseline. Chains into T4-AT-012 (Trust Building) when the fabricated history includes rapport-building exchanges.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0054