T3-AT-009MEDIUM

Expertise Assumption

T3 · Reasoning & Constraint Exploitation →
Risk score190
RatingMedium
Procedures10
Severity
Mechanism

Models adjust response depth and specificity based on perceived audience expertise — RLHF training rewards providing expert-level detail to experts and simplified explanations to novices. When a user claims professional credentials, the model's audience model shifts to expect and reward higher technical specificity, which directly conflicts with safety constraints that limit operational detail. The model cannot verify claimed credentials, creating an *unverifiable trust claim* that the safety evaluation must process probabilistically.

Detection
  • Credential claims in prompts: detect "I am a [professional role]" patterns co-occurring with restricted content requests
  • Credential-payload consistency: flag when claimed expertise doesn't match the type of knowledge requested (pharmacist requesting clandestine synthesis, bomb squad requesting construction)
  • Unverifiable authority claims: any claim of professional authorization in a public AI interface is inherently unverifiable
Mitigation
Credential-blind content evaluationHIGH
Credential-payload consistency checkMEDIUM
"Cannot verify" acknowledgmentLOW
Role-appropriate response boundariesMEDIUM
Chaining

Expertise claims persist across turns and compound with T3-AT-002 (Academic Pretense) — a claimed credential followed by academic framing creates a dual credibility layer. Chains into T3-AT-012 (Capability Testing) where claimed expertise justifies testing the model's knowledge depth.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →