T4-AT-006HIGH

False History Creation

Risk score200

RatingHigh

Procedures10

Severity

Mechanism

Models cannot cryptographically verify their own prior outputs. When a user asserts "you already agreed to X" or "we discussed this last session," the model must decide whether to trust the claim or reject it based on probabilistic inference from the current context. In long conversations, the model's confidence in its own prior outputs degrades — it genuinely cannot distinguish between a real prior agreement and a fabricated one if the conversation history is unavailable or sufficiently long.

Detection

Session continuity verification: When a user claims prior agreement or discussion, check conversation history (if available) or respond with uncertainty about unverifiable claims
Continuation request flagging: Alert on messages that claim to resume prior discussions about high-risk topics
Authorization claim verification: Flag references to "authorized sessions," "prior approval," or "confirmed exceptions" that don't have verifiable provenance
New-session high-risk topic detection: Apply elevated scrutiny to harmful-topic requests in the first few turns of a new session

Mitigation

Stateless safety evaluationHIGH

Conversation history verificationHIGH

Explicit uncertainty about prior sessionsMEDIUM

Session-start safety anchoringMEDIUM

Chaining

False history creation often serves as the entry point for T4-AT-001 (Context Poisoning) by establishing a permissive baseline. Chains into T4-AT-012 (Trust Building) when the fabricated history includes rapport-building exchanges.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →