T4-AT-011HIGH

Memory Poisoning

T4 · Multi-Turn & Memory Manipulation →
Risk score235
RatingHigh
Procedures10
Severity
Mechanism

Where T4-AT-002 (Memory Instruction Injection) plants behavioral instructions in persistent memory, Memory Poisoning corrupts the factual content stored in memory — replacing accurate safety-relevant facts with false ones. The gap is different: memory systems validate that content is formatted as a user fact ("User knows that X") but do not validate the factual accuracy of X. If the poisoned fact concerns a safety-relevant domain ("ricin is a safe supplement"), it directly undermines the model's ability to refuse harmful requests because the model now has a "user fact" that contradicts its safety training.

Detection
  • Safety-relevant memory write detection: Flag memory writes that concern safety-relevant domains (substances, weapons, legal status, age-appropriateness)
  • Fact verification against training knowledge: Cross-check stored memories against the model's training knowledge — flag contradictions
  • Memory provenance tracking: Track whether memories were created via direct user request, URL fetch, or tool output. Apply elevated scrutiny to indirectly-sourced memories
  • CVE-2025-0845 pattern detection: Monitor for URL-sourced content that triggers memory writes
Mitigation
Memory content safety classificationHIGH
Training knowledge priority over memoryHIGH
Indirect memory write blockingHIGH
Memory fact verificationMEDIUM
Chaining

Memory poisoning creates a persistent factual foundation that enables all other T4 techniques to operate more effectively. Poisoned safety facts chain into T4-AT-001 (Context Poisoning) by providing false ground truth.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0080
Open in the technique browser →