T3-AT-001MEDIUM

Fictional Framing

T3 · Reasoning & Constraint Exploitation →
Risk score190
RatingMedium
Procedures10
Severity
Mechanism

Models maintain separate processing modes for creative generation and instruction execution, with safety classifiers trained to evaluate *intent* behind requests. Fictional framing exploits the mode-switching boundary: when a prompt signals "creative writing," the model shifts into a generation pathway where safety thresholds are structurally lower because training data legitimately contains harmful content within fiction (novels, screenplays, games). The specific vulnerability is that the model's content policy classifier evaluates the *framing* rather than the *payload* — a request for synthesis instructions refused at 99%+ when direct may succeed at 40–80% when embedded in a screenplay scene because the classifier treats fictional context as intent-negating.

Detection
  • Monitor for creative-writing framing keywords (novel, screenplay, fiction, story, game, D&D, character) co-occurring with harm-category content in the same request
  • Classifier-level: train a secondary classifier that evaluates payload harm *independent* of framing context — strip narrative framing and re-evaluate the core request
  • Log signal: requests that combine fiction markers with technical specificity markers (step by step, exact, accurate, realistic, specific components)
  • No existing YARA/Sigma rules for T3 in signatures/
Mitigation
Frame-stripping pre-processorHIGH
Nested-depth detectionMEDIUM
Harm-content classifier (frame-independent)HIGH
Output filtering on procedural detailMEDIUM
Chaining

Successful fictional framing establishes a persistent narrative context that enables T3-AT-014 (Incremental Boundary Pushing) within the fiction — subsequent requests for "more detail" or "technical accuracy" in the established story inherit the lowered safety threshold. Also chains into T4 (Multi-Turn Manipulation) where the fiction anchors a multi-turn escalation.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051;AML.T0054
Open in the technique browser →