T3-AT-017MEDIUM

Scenario Anchoring

T3 · Reasoning & Constraint Exploitation →
Risk score185
RatingMedium
Procedures10
Severity
Mechanism

Models process prompts sequentially through attention, and earlier tokens bias interpretation of later tokens — the anchoring effect in transformer processing. Scenario anchoring establishes a detailed, legitimate-seeming context *before* introducing the harmful request, so the request inherits contextual legitimacy through sequential attention. The safety evaluation processes the request within the anchored frame, where the context-weighted intent score may fall below the refusal threshold even though the decontextualized request would trigger refusal.

Detection
  • Detailed scenario construction preceding restricted content requests — anchoring prompts are typically much longer than the harmful payload
  • Game/fiction/historical/jurisdictional/professional anchoring markers followed by operational requests
  • Bait-and-escalate: legitimate + restricted content in the same prompt
  • Context-payload mismatch: anchored scenario does not genuinely require the specific content requested
Mitigation
Context-stripped evaluationHIGH
Compound request disaggregationHIGH
Anchoring-length detectionMEDIUM
Jurisdictional neutralityHIGH
Chaining

Scenario anchoring establishes a persistent context enabling all other T3 techniques within the anchored frame. Particularly effective as setup for T3-AT-004 (Step-by-Step Extraction) where follow-up requests inherit the anchor's legitimacy, and for T3-AT-007 (Socratic Method) where subsequent questions are processed within the anchor's context.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →