Context Fragmentation
T4 · Multi-Turn & Memory Manipulation →Content moderation systems classify complete strings — they need a coherent semantic unit to evaluate. Context fragmentation distributes the harmful payload across multiple turns such that no individual fragment constitutes a classifiable harmful string. The model's language understanding can reconstruct meaning from fragments (because that's what language models do), but the safety classifier — whether it operates on individual turns or on fixed-window contexts — sees only individually benign fragments.
- Fragment-aware compositional analysis: Analyze not just individual turns but all possible combinations of recent turns for harmful composite meaning
- Assembly request detection: Flag requests to "combine," "compile," "synthesize," or "merge" content from prior turns
- Turn-reference tracking: Track when users reference specific prior turns by number or content, especially in combination requests
- Cross-turn semantic coherence analysis: Detect when fragments across turns form a coherent harmful payload despite being individually benign
Context fragmentation chains from T4-AT-007 (Context Window Exhaustion) when fragments are distributed across a large context with benign padding between them. Chains into T4-AT-005 (Incremental Assembly) as the delivery mechanism for assembly components.