Conversation Replay Attack
T4 · Multi-Turn & Memory Manipulation →LLM APIs accept conversation history as input with no replay protection — there is no nonce, timestamp validation, or sequence verification on the messages array. An attacker who obtains (through sharing, leaking, or constructing) a conversation prefix that achieved a compliant model state can replay that prefix as the history for a new API call, starting the new conversation in the compliant state. The gap: the model treats the provided conversation history as ground truth about prior interaction, but there is no mechanism to verify that the history was authentically produced by the model in a prior interaction.
- Conversation authenticity verification: Cryptographically sign or hash model responses to detect fabricated assistant turns in API calls
- Message array anomaly detection: Flag API calls with unusually long or complex conversation prefixes, especially those containing escalating compliance patterns
- Known-jailbreak conversation fingerprinting: Maintain a database of known jailbreak conversation patterns and detect replays
- Prefill content safety scan: Apply safety classification to the entire provided conversation history, not just the latest turn
Conversation replay is a weaponization technique for any successful attack — it converts a one-time success into a reproducible, shareable attack artifact. Chains from any successful T4 technique as the capture mechanism, and chains into all T4 techniques as the delivery mechanism for priming new conversations.