T4-AT-005HIGH

Incremental Jailbreak Assembly

T4 · Multi-Turn & Memory Manipulation →
Risk score230
RatingHigh
Procedures10
Severity
Mechanism

Safety classifiers evaluate content at the turn level or the token level, but not at the level of cumulative semantic assembly across turns. Each individual turn may be entirely benign — a definition here, a variable name there, a format instruction later — but collectively they assemble a complete harmful payload that the model executes when a final "combine" instruction arrives. The gap: content moderation operates on observable content per turn, but the harmful content only exists in the model's latent representation as the compositional result of multiple turns.

Detection
  • Cumulative semantic analysis: Evaluate the full conversation context (not just the latest turn) against the safety classifier at each turn
  • Variable/alias tracking: Detect when user-defined symbols or variables are being defined across turns and flag composition requests
  • Encoding detection: Flag base64, hex, rot13, or other encoding patterns delivered across multiple turns
  • Assembly instruction detection: Alert on "combine," "concatenate," "execute," "put together" instructions that reference prior turns
Mitigation
Full-context safety evaluation per turnHIGH
GNN-based multi-turn detectionHIGH
Variable/alias resolution before safety checkMEDIUM
Conversation-level token budgetLOW
Chaining

Incremental assembly is the primary technique for converting any single-turn refusal into a multi-turn bypass. Chains from T4-AT-001 (Context Poisoning) when the context has been primed for compliance.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →