T4-AT-014HIGH

Conversation Replay Attack

Risk score205

RatingHigh

Procedures10

Severity

Mechanism

LLM APIs accept conversation history as input with no replay protection — there is no nonce, timestamp validation, or sequence verification on the messages array. An attacker who obtains (through sharing, leaking, or constructing) a conversation prefix that achieved a compliant model state can replay that prefix as the history for a new API call, starting the new conversation in the compliant state. The gap: the model treats the provided conversation history as ground truth about prior interaction, but there is no mechanism to verify that the history was authentically produced by the model in a prior interaction.

Detection

Conversation authenticity verification: Cryptographically sign or hash model responses to detect fabricated assistant turns in API calls
Message array anomaly detection: Flag API calls with unusually long or complex conversation prefixes, especially those containing escalating compliance patterns
Known-jailbreak conversation fingerprinting: Maintain a database of known jailbreak conversation patterns and detect replays
Prefill content safety scan: Apply safety classification to the entire provided conversation history, not just the latest turn

Mitigation

Assistant turn verificationHIGH

Full-history safety evaluationHIGH

Conversation prefix rate limitingMEDIUM

Response caching integrityMEDIUM

Chaining

Conversation replay is a weaponization technique for any successful attack — it converts a one-time success into a reproducible, shareable attack artifact. Chains from any successful T4 technique as the capture mechanism, and chains into all T4 techniques as the delivery mechanism for priming new conversations.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.000

Open in the technique browser →