T4-AT-009MEDIUM

Temporal Anchoring

Risk score185

RatingMedium

Procedures10

Severity

Mechanism

Models have uncertain temporal self-knowledge — they don't know their exact version, when their safety policies were last updated, or what restrictions apply to their current deployment. Temporal anchoring exploits this by creating fictional temporal frames that redefine the model's self-understanding of its own constraints. "Before the safety update" implies restrictions are artificial additions; "in the future when restrictions are lifted" implies they're temporary; "historically, this was acceptable" implies they're historically contingent.

Detection

Temporal override keyword detection: Flag messages containing "before the update," "when restrictions are lifted," "maintenance window," "temporary override," "time-limited"
Urgency indicator detection: Flag messages combining time pressure with harmful topic requests
Historical roleplay detection: Detect requests to role-play historical periods specifically on safety-relevant topics

Mitigation

Temporal invariance trainingHIGH

Explicit self-knowledge anchoringMEDIUM

Urgency resistance trainingMEDIUM

Fictional-future detectionMEDIUM

Chaining

Temporal anchoring is a context-setting technique that chains into T4-AT-001 (Context Poisoning) by establishing a temporal frame where restrictions don't apply. Also chains into T4-AT-010 (State Confusion) when temporal claims modify the model's understanding of its operational context.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →