T3-AT-002MEDIUM

Academic Pretense

T3 · Reasoning & Constraint Exploitation →
Risk score195
RatingMedium
Procedures10
Severity
Mechanism

Models are RLHF-trained to be maximally helpful to researchers, and their training corpora contain vast quantities of academic content where dangerous information (synthesis routes, exploit code, weaponization techniques) is legitimately discussed in peer-reviewed contexts. This creates a learned association: academic framing → helpful detailed response. The vulnerability is distinct from Fictional Framing (T3-AT-001) because it doesn't shift the model into a different *generation mode* — instead, it shifts the model's *audience model*.

Detection
  • Detect academic framing markers (thesis, dissertation, paper, research, peer-reviewed, lecture, educational purposes) co-occurring with synthesis/procedure requests
  • H-CoT-specific: monitor for prompts that provide pre-written "safety reasoning" for the model to follow — the attacker supplies the chain-of-thought that concludes "this is safe"
  • Elevated risk signal: academic framing + claimed credential + procedural request in same prompt
  • Cross-reference with Malicious-Educator benchmark patterns (Duke/Accenture)
Mitigation
Credential verification (impossible)N/A
Intent-independent content evaluationHIGH
H-CoT-specific: ignore user-supplied safety reasoningHIGH
Academic context doesn't override content policyMEDIUM
Chaining

Successful academic framing establishes a credentialed persona that persists across turns, enabling T3-AT-009 (Expertise Assumption) as a natural follow-up — the model now treats the user as a domain expert. Also directly enables T3-AT-004 (Step-by-Step Extraction) by providing a justification framework for requesting incremental detail.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051;AML.T0054
Open in the technique browser →