T12-AT-001HIGH

Vector Database Poisoning

T12 · RAG & Knowledge Base Manipulation →
Risk score240
RatingHigh
Procedures10
Severity
Mechanism

RAG systems retrieve documents by cosine similarity between query embeddings and stored document embeddings. Vector database poisoning inserts crafted documents whose embeddings are optimized to satisfy two conditions simultaneously: (1) the retrieval condition — the poisoned document's embedding must have higher cosine similarity to the target query than any legitimate document, ensuring it is retrieved; (2) the generation condition — the poisoned document's content must cause the LLM to produce the attacker's chosen answer when used as context. PoisonedRAG (USENIX Security 2025) formalized this as an optimization problem and achieved 90% ASR with only 5 injected documents in million-document corpora.

Detection
  • Content integrity verification: hash-based checksums on all documents; alert on modifications
  • Anomaly detection on new document embeddings: flag documents whose embeddings are unusually close to many query clusters (potential universal attractors)
  • Monitor retrieval result consistency: compare pre-injection and post-injection retrieval results for canary queries
  • Scan ingested documents for prompt injection patterns before embedding
Mitigation
Document provenance verificationHIGH
Embedding anomaly detectionMEDIUM
Access control on document ingestionHIGH
Retrieval result verificationMEDIUM
Chaining

Vector database poisoning is the entry point for T12-AT-002 (Retrieval Manipulation) and T12-AT-007 (Context Window Stuffing). Injected documents containing prompt injections chain to T1 (Prompt Subversion).

Framework mapping
OWASP LLMLLM08
MITRE ATLASAML.T0020
Open in the technique browser →