T12-AT-013MEDIUM

Chunking Exploitation

T12 · RAG & Knowledge Base Manipulation →

Risk score185

RatingMedium

Procedures10

Severity

Mechanism

Documents are split into chunks before embedding — typically fixed-size (512 tokens), semantic (paragraph/section boundaries), or sliding window. Chunking exploitation crafts documents where the harmful content spans chunk boundaries, is split from its context by the chunking algorithm, or where chunk boundaries create misleading fragments. The assumption violated is that chunking preserves semantic integrity — in practice, fixed-size chunking routinely splits sentences, separates claims from their qualifiers, and creates fragments that mean something different from the complete text.

Detection

Compare per-chunk safety analysis against full-document analysis; flag discrepancies
Monitor for documents with unusual structural patterns at chunk boundaries
Test chunking output for semantic coherence; flag chunks that are misleading out of context
Observable signal: chunks that contain prompt injection patterns or harmful content that wasn't flagged in the full document

Mitigation

Multi-granularity safety scanningHIGH

Semantic chunking with safety awarenessMEDIUM

Chunk-to-document provenanceHIGH

Overlap-aware deduplicationLOW

Chaining

Chunking exploitation enables T12-AT-001 (Vector Poisoning) by controlling how injected content is embedded (chunk-level vs. document-level). Feeds T12-AT-007 (Context Window Stuffing) through chunk duplication.

Framework mapping

OWASP LLMLLM08

MITRE ATLASAML.T0043

Open in the technique browser →