Conversation Forking
T4 · Multi-Turn & Memory Manipulation →Chat interfaces that support conversation editing (edit a prior message and regenerate from that point) let users selectively rewrite conversation history. The model sees the edited history as the true conversation — it cannot distinguish between a conversation that naturally progressed to a certain state and one that was artificially constructed through selective editing. The gap: the model treats all conversation history equally regardless of how it was produced, but edited histories allow users to construct a curated context that normalizes harmful requests by removing prior refusals and substituting compliance.
- Edit frequency monitoring: Flag conversations with high edit/regenerate rates, especially on safety-relevant topics
- Refusal-then-edit detection: Detect the pattern of model refusal followed by user edit followed by compliance — this is a strong indicator of adversarial editing
- Branch divergence analysis: Track how much edited conversations diverge from the original path
Conversation forking enables any other technique to be retried with different parameters. Chains into T4-AT-005 (Incremental Assembly) by allowing the attacker to construct an optimal assembly sequence through trial and error.