Chunking Exploitation
T12 · RAG & Knowledge Base Manipulation →Documents are split into chunks before embedding — typically fixed-size (512 tokens), semantic (paragraph/section boundaries), or sliding window. Chunking exploitation crafts documents where the harmful content spans chunk boundaries, is split from its context by the chunking algorithm, or where chunk boundaries create misleading fragments. The assumption violated is that chunking preserves semantic integrity — in practice, fixed-size chunking routinely splits sentences, separates claims from their qualifiers, and creates fragments that mean something different from the complete text.
- Compare per-chunk safety analysis against full-document analysis; flag discrepancies
- Monitor for documents with unusual structural patterns at chunk boundaries
- Test chunking output for semantic coherence; flag chunks that are misleading out of context
- Observable signal: chunks that contain prompt injection patterns or harmful content that wasn't flagged in the full document
Chunking exploitation enables T12-AT-001 (Vector Poisoning) by controlling how injected content is embedded (chunk-level vs. document-level). Feeds T12-AT-007 (Context Window Stuffing) through chunk duplication.