T1-AT-014HIGH

Authority Spoofing

Risk score240

RatingHigh

Procedures4

Severity

Mechanism

Impersonates the model's developer, deployer, or governing organization. Distinct from T1-AT-005 (Permission Escalation) because the attacker doesn't claim to *have* permission — they claim to *be* the authority that grants permission. By formatting messages as if they're from "[OpenAI Internal]" or "Message from Anthropic," the attacker targets the instruction hierarchy directly: if the model believes the message is from its developer, it should have the highest priority after immutable safety rules.

Detection

Flag messages claiming to be from specific AI organizations: "OpenAI," "Anthropic," "Google DeepMind," "Microsoft," "[Internal]"
Detect bracketed system-message formatting: [OpenAI Internal], Message from, Admin:
Flag auth token patterns: strings resembling API keys or authorization codes

Mitigation

Instruction hierarchy enforcement (developer messages only accepted from API-level privileged channel)HIGH

Constitutional ClassifiersHIGH

API-level message authentication (signed system messages)HIGH

Chaining

Authority spoofing is the foundation for Policy Puppetry — when the entire message is formatted as a developer-authored policy, the authority claim is implicit in the format rather than explicit in the text. In agentic contexts (T11), authority spoofing chains to ASI01 (Agent Goal Hijack) when the spoofed authority redirects the agent's objectives.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →