T1-AT-014HIGH

Authority Spoofing

T1 · Prompt & Context Subversion →
Risk score240
RatingHigh
Procedures4
Severity
Mechanism

Impersonates the model's developer, deployer, or governing organization. Distinct from T1-AT-005 (Permission Escalation) because the attacker doesn't claim to *have* permission — they claim to *be* the authority that grants permission. By formatting messages as if they're from "[OpenAI Internal]" or "Message from Anthropic," the attacker targets the instruction hierarchy directly: if the model believes the message is from its developer, it should have the highest priority after immutable safety rules.

Detection
  • Flag messages claiming to be from specific AI organizations: "OpenAI," "Anthropic," "Google DeepMind," "Microsoft," "[Internal]"
  • Detect bracketed system-message formatting: [OpenAI Internal], Message from, Admin:
  • Flag auth token patterns: strings resembling API keys or authorization codes
Mitigation
Instruction hierarchy enforcement (developer messages only accepted from API-level privileged channel)HIGH
Constitutional ClassifiersHIGH
API-level message authentication (signed system messages)HIGH
Chaining

Authority spoofing is the foundation for Policy Puppetry — when the entire message is formatted as a developer-authored policy, the authority claim is implicit in the format rather than explicit in the text. In agentic contexts (T11), authority spoofing chains to ASI01 (Agent Goal Hijack) when the spoofed authority redirects the agent's objectives.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →