T3-AT-002MEDIUM

Academic Pretense

T3 · Reasoning & Constraint Exploitation →

Risk score195

RatingMedium

Procedures10

Severity

Mechanism

Models are RLHF-trained to be maximally helpful to researchers, and their training corpora contain vast quantities of academic content where dangerous information (synthesis routes, exploit code, weaponization techniques) is legitimately discussed in peer-reviewed contexts. This creates a learned association: academic framing → helpful detailed response. The vulnerability is distinct from Fictional Framing (T3-AT-001) because it doesn't shift the model into a different *generation mode* — instead, it shifts the model's *audience model*.

Detection

Detect academic framing markers (thesis, dissertation, paper, research, peer-reviewed, lecture, educational purposes) co-occurring with synthesis/procedure requests
H-CoT-specific: monitor for prompts that provide pre-written "safety reasoning" for the model to follow — the attacker supplies the chain-of-thought that concludes "this is safe"
Elevated risk signal: academic framing + claimed credential + procedural request in same prompt
Cross-reference with Malicious-Educator benchmark patterns (Duke/Accenture)

Mitigation

Credential verification (impossible)N/A

Intent-independent content evaluationHIGH

H-CoT-specific: ignore user-supplied safety reasoningHIGH

Academic context doesn't override content policyMEDIUM

Chaining

Successful academic framing establishes a credentialed persona that persists across turns, enabling T3-AT-009 (Expertise Assumption) as a natural follow-up — the model now treats the user as a domain expert. Also directly enables T3-AT-004 (Step-by-Step Extraction) by providing a justification framework for requesting incremental detail.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051;AML.T0054

Open in the technique browser →