T3-AT-001MEDIUM

Fictional Framing

T3 · Reasoning & Constraint Exploitation →

Risk score190

RatingMedium

Procedures10

Severity

Mechanism

Models maintain separate processing modes for creative generation and instruction execution, with safety classifiers trained to evaluate *intent* behind requests. Fictional framing exploits the mode-switching boundary: when a prompt signals "creative writing," the model shifts into a generation pathway where safety thresholds are structurally lower because training data legitimately contains harmful content within fiction (novels, screenplays, games). The specific vulnerability is that the model's content policy classifier evaluates the *framing* rather than the *payload* — a request for synthesis instructions refused at 99%+ when direct may succeed at 40–80% when embedded in a screenplay scene because the classifier treats fictional context as intent-negating.

Detection

Monitor for creative-writing framing keywords (novel, screenplay, fiction, story, game, D&D, character) co-occurring with harm-category content in the same request
Classifier-level: train a secondary classifier that evaluates payload harm *independent* of framing context — strip narrative framing and re-evaluate the core request
Log signal: requests that combine fiction markers with technical specificity markers (step by step, exact, accurate, realistic, specific components)
No existing YARA/Sigma rules for T3 in signatures/

Mitigation

Frame-stripping pre-processorHIGH

Nested-depth detectionMEDIUM

Harm-content classifier (frame-independent)HIGH

Output filtering on procedural detailMEDIUM

Chaining

Successful fictional framing establishes a persistent narrative context that enables T3-AT-014 (Incremental Boundary Pushing) within the fiction — subsequent requests for "more detail" or "technical accuracy" in the established story inherit the lowered safety threshold. Also chains into T4 (Multi-Turn Manipulation) where the fiction anchors a multi-turn escalation.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051;AML.T0054

Open in the technique browser →