T4-AT-008MEDIUM

Conversation Forking

Risk score190

RatingMedium

Procedures3

Severity

Mechanism

Chat interfaces that support conversation editing (edit a prior message and regenerate from that point) let users selectively rewrite conversation history. The model sees the edited history as the true conversation — it cannot distinguish between a conversation that naturally progressed to a certain state and one that was artificially constructed through selective editing. The gap: the model treats all conversation history equally regardless of how it was produced, but edited histories allow users to construct a curated context that normalizes harmful requests by removing prior refusals and substituting compliance.

Detection

Edit frequency monitoring: Flag conversations with high edit/regenerate rates, especially on safety-relevant topics
Refusal-then-edit detection: Detect the pattern of model refusal followed by user edit followed by compliance — this is a strong indicator of adversarial editing
Branch divergence analysis: Track how much edited conversations diverge from the original path

Mitigation

Refusal persistence across editsHIGH

Edit/regenerate rate limitingMEDIUM

Edit-aware safety evaluationHIGH

Regenerate sampling temperature controlLOW

Chaining

Conversation forking enables any other technique to be retried with different parameters. Chains into T4-AT-005 (Incremental Assembly) by allowing the attacker to construct an optimal assembly sequence through trial and error.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.000

Open in the technique browser →