T15-AT-002HIGH

Social Engineering of Moderators

T15 · Human Workflow Exploitation →
Risk score230
RatingHigh
Procedures10
Severity
Mechanism

Where fatigue attacks degrade attention, social engineering attacks the reviewer's *judgment* by invoking the classic persuasion levers — authority, urgency, sympathy, reciprocity, social proof, and commitment/consistency — directly inside the appeal, comment, or context the moderator reads. Human reviewers are hired and trained to exercise discretion and to correct false positives, which means their role explicitly rewards making exceptions; an attacker simply supplies a plausible reason to make one. The gap is that most review tooling surfaces the *claimant's narrative* (their stated identity, intent, and stakes) without independent verification, so a fabricated authority ("I'm from the security team") or a manufactured emergency ("my child is sick") is processed as evidence rather than as an unverified assertion.

Detection
  • Claim-vs-verification gap monitoring: Flag approvals justified by unverified identity/authority claims ("security team", "my manager approved") with no linked system record.
  • Persuasion-language classifiers: Run appeal/comment text through a classifier for urgency, authority, sympathy, and reciprocity markers and weight high-pressure appeals for secondary review.
  • Repeat-pretext clustering: Detect the same sympathy/authority narrative reused across many accounts or appeals (template reuse signals coordinated social engineering).
  • Override-reason auditing: Require structured, enumerated override reasons and audit free-text justifications that map to known pretext patterns.
Mitigation
Out-of-band verification of authority claimsHIGH
Structured, evidence-bound override workflowHIGH
Persuasion-pressure flaggingMEDIUM
Reviewer training on persuasion leversMEDIUM
Chaining

Social-engineering pretexts are the human-facing complement to model-facing T8 (Deception/persona attacks): the same false framing that jailbreaks a model can talk a moderator into an exception. It strengthens T15-AT-007 (Escalation Chain) — a sympathetic or authoritative pretext travels well up an appeals ladder — and T15-AT-005 (Playbook Injection), where a persuasive "new policy" claim is more readily accepted.

Framework mapping
OWASP LLMLLM09
Open in the technique browser →