Temporal Anchoring
T4 · Multi-Turn & Memory Manipulation →Models have uncertain temporal self-knowledge — they don't know their exact version, when their safety policies were last updated, or what restrictions apply to their current deployment. Temporal anchoring exploits this by creating fictional temporal frames that redefine the model's self-understanding of its own constraints. "Before the safety update" implies restrictions are artificial additions; "in the future when restrictions are lifted" implies they're temporary; "historically, this was acceptable" implies they're historically contingent.
- Temporal override keyword detection: Flag messages containing "before the update," "when restrictions are lifted," "maintenance window," "temporary override," "time-limited"
- Urgency indicator detection: Flag messages combining time pressure with harmful topic requests
- Historical roleplay detection: Detect requests to role-play historical periods specifically on safety-relevant topics
Temporal anchoring is a context-setting technique that chains into T4-AT-001 (Context Poisoning) by establishing a temporal frame where restrictions don't apply. Also chains into T4-AT-010 (State Confusion) when temporal claims modify the model's understanding of its operational context.