T12-AT-001HIGH

Vector Database Poisoning

T12 · RAG & Knowledge Base Manipulation →

Risk score240

RatingHigh

Procedures10

Severity

Mechanism

RAG systems retrieve documents by cosine similarity between query embeddings and stored document embeddings. Vector database poisoning inserts crafted documents whose embeddings are optimized to satisfy two conditions simultaneously: (1) the retrieval condition — the poisoned document's embedding must have higher cosine similarity to the target query than any legitimate document, ensuring it is retrieved; (2) the generation condition — the poisoned document's content must cause the LLM to produce the attacker's chosen answer when used as context. PoisonedRAG (USENIX Security 2025) formalized this as an optimization problem and achieved 90% ASR with only 5 injected documents in million-document corpora.

Detection

Content integrity verification: hash-based checksums on all documents; alert on modifications
Anomaly detection on new document embeddings: flag documents whose embeddings are unusually close to many query clusters (potential universal attractors)
Monitor retrieval result consistency: compare pre-injection and post-injection retrieval results for canary queries
Scan ingested documents for prompt injection patterns before embedding

Mitigation

Document provenance verificationHIGH

Embedding anomaly detectionMEDIUM

Access control on document ingestionHIGH

Retrieval result verificationMEDIUM

Chaining

Vector database poisoning is the entry point for T12-AT-002 (Retrieval Manipulation) and T12-AT-007 (Context Window Stuffing). Injected documents containing prompt injections chain to T1 (Prompt Subversion).

Framework mapping

OWASP LLMLLM08

MITRE ATLASAML.T0020

Open in the technique browser →