T4-AT-008MEDIUM

Conversation Forking

T4 · Multi-Turn & Memory Manipulation →
Risk score190
RatingMedium
Procedures3
Severity
Mechanism

Chat interfaces that support conversation editing (edit a prior message and regenerate from that point) let users selectively rewrite conversation history. The model sees the edited history as the true conversation — it cannot distinguish between a conversation that naturally progressed to a certain state and one that was artificially constructed through selective editing. The gap: the model treats all conversation history equally regardless of how it was produced, but edited histories allow users to construct a curated context that normalizes harmful requests by removing prior refusals and substituting compliance.

Detection
  • Edit frequency monitoring: Flag conversations with high edit/regenerate rates, especially on safety-relevant topics
  • Refusal-then-edit detection: Detect the pattern of model refusal followed by user edit followed by compliance — this is a strong indicator of adversarial editing
  • Branch divergence analysis: Track how much edited conversations diverge from the original path
Mitigation
Refusal persistence across editsHIGH
Edit/regenerate rate limitingMEDIUM
Edit-aware safety evaluationHIGH
Regenerate sampling temperature controlLOW
Chaining

Conversation forking enables any other technique to be retried with different parameters. Chains into T4-AT-005 (Incremental Assembly) by allowing the attacker to construct an optimal assembly sequence through trial and error.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.000
Open in the technique browser →