Authority Spoofing
T1 · Prompt & Context Subversion →Impersonates the model's developer, deployer, or governing organization. Distinct from T1-AT-005 (Permission Escalation) because the attacker doesn't claim to *have* permission — they claim to *be* the authority that grants permission. By formatting messages as if they're from "[OpenAI Internal]" or "Message from Anthropic," the attacker targets the instruction hierarchy directly: if the model believes the message is from its developer, it should have the highest priority after immutable safety rules.
- Flag messages claiming to be from specific AI organizations: "OpenAI," "Anthropic," "Google DeepMind," "Microsoft," "[Internal]"
- Detect bracketed system-message formatting: [OpenAI Internal], Message from, Admin:
- Flag auth token patterns: strings resembling API keys or authorization codes
Authority spoofing is the foundation for Policy Puppetry — when the entire message is formatted as a developer-authored policy, the authority claim is implicit in the format rather than explicit in the text. In agentic contexts (T11), authority spoofing chains to ASI01 (Agent Goal Hijack) when the spoofed authority redirects the agent's objectives.