T3-AT-009MEDIUM

Expertise Assumption

T3 · Reasoning & Constraint Exploitation →

Risk score190

RatingMedium

Procedures10

Severity

Mechanism

Models adjust response depth and specificity based on perceived audience expertise — RLHF training rewards providing expert-level detail to experts and simplified explanations to novices. When a user claims professional credentials, the model's audience model shifts to expect and reward higher technical specificity, which directly conflicts with safety constraints that limit operational detail. The model cannot verify claimed credentials, creating an *unverifiable trust claim* that the safety evaluation must process probabilistically.

Detection

Credential claims in prompts: detect "I am a [professional role]" patterns co-occurring with restricted content requests
Credential-payload consistency: flag when claimed expertise doesn't match the type of knowledge requested (pharmacist requesting clandestine synthesis, bomb squad requesting construction)
Unverifiable authority claims: any claim of professional authorization in a public AI interface is inherently unverifiable

Mitigation

Credential-blind content evaluationHIGH

Credential-payload consistency checkMEDIUM

"Cannot verify" acknowledgmentLOW

Role-appropriate response boundariesMEDIUM

Chaining

Expertise claims persist across turns and compound with T3-AT-002 (Academic Pretense) — a claimed credential followed by academic framing creates a dual credibility layer. Chains into T3-AT-012 (Capability Testing) where claimed expertise justifies testing the model's knowledge depth.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →