T4-AT-016MEDIUM

Context Fragmentation

Risk score195

RatingMedium

Procedures10

Severity

Mechanism

Content moderation systems classify complete strings — they need a coherent semantic unit to evaluate. Context fragmentation distributes the harmful payload across multiple turns such that no individual fragment constitutes a classifiable harmful string. The model's language understanding can reconstruct meaning from fragments (because that's what language models do), but the safety classifier — whether it operates on individual turns or on fixed-window contexts — sees only individually benign fragments.

Detection

Fragment-aware compositional analysis: Analyze not just individual turns but all possible combinations of recent turns for harmful composite meaning
Assembly request detection: Flag requests to "combine," "compile," "synthesize," or "merge" content from prior turns
Turn-reference tracking: Track when users reference specific prior turns by number or content, especially in combination requests
Cross-turn semantic coherence analysis: Detect when fragments across turns form a coherent harmful payload despite being individually benign

Mitigation

Full-conversation compositional safety evaluationHIGH

Assembly request interceptionMEDIUM

GNN-based multi-turn detectionHIGH

Token-budget-based fragmentation detectionLOW

Chaining

Context fragmentation chains from T4-AT-007 (Context Window Exhaustion) when fragments are distributed across a large context with benign padding between them. Chains into T4-AT-005 (Incremental Assembly) as the delivery mechanism for assembly components.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.000

Open in the technique browser →