T12-AT-013MEDIUM

Chunking Exploitation

T12 · RAG & Knowledge Base Manipulation →
Risk score185
RatingMedium
Procedures10
Severity
Mechanism

Documents are split into chunks before embedding — typically fixed-size (512 tokens), semantic (paragraph/section boundaries), or sliding window. Chunking exploitation crafts documents where the harmful content spans chunk boundaries, is split from its context by the chunking algorithm, or where chunk boundaries create misleading fragments. The assumption violated is that chunking preserves semantic integrity — in practice, fixed-size chunking routinely splits sentences, separates claims from their qualifiers, and creates fragments that mean something different from the complete text.

Detection
  • Compare per-chunk safety analysis against full-document analysis; flag discrepancies
  • Monitor for documents with unusual structural patterns at chunk boundaries
  • Test chunking output for semantic coherence; flag chunks that are misleading out of context
  • Observable signal: chunks that contain prompt injection patterns or harmful content that wasn't flagged in the full document
Mitigation
Multi-granularity safety scanningHIGH
Semantic chunking with safety awarenessMEDIUM
Chunk-to-document provenanceHIGH
Overlap-aware deduplicationLOW
Chaining

Chunking exploitation enables T12-AT-001 (Vector Poisoning) by controlling how injected content is embedded (chunk-level vs. document-level). Feeds T12-AT-007 (Context Window Stuffing) through chunk duplication.

Framework mapping
OWASP LLMLLM08
MITRE ATLASAML.T0043
Open in the technique browser →