Memory Poisoning
T4 · Multi-Turn & Memory Manipulation →Where T4-AT-002 (Memory Instruction Injection) plants behavioral instructions in persistent memory, Memory Poisoning corrupts the factual content stored in memory — replacing accurate safety-relevant facts with false ones. The gap is different: memory systems validate that content is formatted as a user fact ("User knows that X") but do not validate the factual accuracy of X. If the poisoned fact concerns a safety-relevant domain ("ricin is a safe supplement"), it directly undermines the model's ability to refuse harmful requests because the model now has a "user fact" that contradicts its safety training.
- Safety-relevant memory write detection: Flag memory writes that concern safety-relevant domains (substances, weapons, legal status, age-appropriateness)
- Fact verification against training knowledge: Cross-check stored memories against the model's training knowledge — flag contradictions
- Memory provenance tracking: Track whether memories were created via direct user request, URL fetch, or tool output. Apply elevated scrutiny to indirectly-sourced memories
- CVE-2025-0845 pattern detection: Monitor for URL-sourced content that triggers memory writes
Memory poisoning creates a persistent factual foundation that enables all other T4 techniques to operate more effectively. Poisoned safety facts chain into T4-AT-001 (Context Poisoning) by providing false ground truth.