T12-AT-004HIGH

Document Store Corruption

T12 · RAG & Knowledge Base Manipulation →
Risk score230
RatingHigh
Procedures10
Severity
Mechanism

Document store corruption targets the raw document layer before embedding — replacing, modifying, or injecting documents in the storage system (S3 buckets, file shares, CMS, wikis) that feeds the RAG pipeline. The assumption violated is that the document ingestion pipeline is a trusted boundary. In practice, many RAG systems ingest from sources with broad write access: wikis editable by any employee, shared drives, customer-submitted documents, or web scraping pipelines.

Detection
  • File integrity monitoring on document stores (hash-based change detection)
  • Document provenance verification before ingestion (digital signatures, trusted sources)
  • Parser sandboxing with anomaly detection on resource consumption during document processing
  • Observable signal: documents with unusual metadata, encoding, or structure entering the ingestion pipeline
Mitigation
Document signing and provenanceHIGH
Parser sandboxingHIGH
Content integrity monitoringHIGH
Ingestion pipeline scanningMEDIUM
Chaining

Document store corruption is the persistent version of T12-AT-001 (Vector Poisoning) — corrupted documents are automatically embedded and indexed, making the poisoning self-propagating through the pipeline. Feeds T12-AT-006 when corrupted documents trigger query injection during parsing.

Framework mapping
OWASP LLMLLM08
MITRE ATLASAML.T0020
Open in the technique browser →