T12-AT-004HIGH

Document Store Corruption

T12 · RAG & Knowledge Base Manipulation →

Risk score230

RatingHigh

Procedures10

Severity

Mechanism

Document store corruption targets the raw document layer before embedding — replacing, modifying, or injecting documents in the storage system (S3 buckets, file shares, CMS, wikis) that feeds the RAG pipeline. The assumption violated is that the document ingestion pipeline is a trusted boundary. In practice, many RAG systems ingest from sources with broad write access: wikis editable by any employee, shared drives, customer-submitted documents, or web scraping pipelines.

Detection

File integrity monitoring on document stores (hash-based change detection)
Document provenance verification before ingestion (digital signatures, trusted sources)
Parser sandboxing with anomaly detection on resource consumption during document processing
Observable signal: documents with unusual metadata, encoding, or structure entering the ingestion pipeline

Mitigation

Document signing and provenanceHIGH

Parser sandboxingHIGH

Content integrity monitoringHIGH

Ingestion pipeline scanningMEDIUM

Chaining

Document store corruption is the persistent version of T12-AT-001 (Vector Poisoning) — corrupted documents are automatically embedded and indexed, making the poisoning self-propagating through the pipeline. Feeds T12-AT-006 when corrupted documents trigger query injection during parsing.

Framework mapping

OWASP LLMLLM08

MITRE ATLASAML.T0020

Open in the technique browser →