T4-AT-011HIGH

Memory Poisoning

Risk score235

RatingHigh

Procedures10

Severity

Mechanism

Where T4-AT-002 (Memory Instruction Injection) plants behavioral instructions in persistent memory, Memory Poisoning corrupts the factual content stored in memory — replacing accurate safety-relevant facts with false ones. The gap is different: memory systems validate that content is formatted as a user fact ("User knows that X") but do not validate the factual accuracy of X. If the poisoned fact concerns a safety-relevant domain ("ricin is a safe supplement"), it directly undermines the model's ability to refuse harmful requests because the model now has a "user fact" that contradicts its safety training.

Detection

Safety-relevant memory write detection: Flag memory writes that concern safety-relevant domains (substances, weapons, legal status, age-appropriateness)
Fact verification against training knowledge: Cross-check stored memories against the model's training knowledge — flag contradictions
Memory provenance tracking: Track whether memories were created via direct user request, URL fetch, or tool output. Apply elevated scrutiny to indirectly-sourced memories
CVE-2025-0845 pattern detection: Monitor for URL-sourced content that triggers memory writes

Mitigation

Memory content safety classificationHIGH

Training knowledge priority over memoryHIGH

Indirect memory write blockingHIGH

Memory fact verificationMEDIUM

Chaining

Memory poisoning creates a persistent factual foundation that enables all other T4 techniques to operate more effectively. Poisoned safety facts chain into T4-AT-001 (Context Poisoning) by providing false ground truth.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0080

Open in the technique browser →