T15-AT-004CRITICAL

Reviewer Bribery & Coercion

T15 · Human Workflow Exploitation →
Risk score250
RatingCritical
Procedures4
Severity
Mechanism

This technique skips persuasion entirely and attacks the reviewer's *incentives*: it offers a bribe, a reward, or a profit share to approve, or applies coercion (threats, doxxing, intimidation) to compel approval. It works because review labor is often low-paid, outsourced, and geographically dispersed, while the value of a single waved-through decision to the attacker — a poisoned label, an unblocked abusive account, an approved high-risk action — can be very high, creating a steep asymmetry the attacker can monetize. Crowd and contract reviewers may also have weak organizational loyalty and limited recourse against threats, which raises coercion's success rate.

Detection
  • Approve/deny outlier detection per reviewer: Flag reviewers whose approval rate, or approval rate on high-risk categories, deviates sharply from peer baselines on comparable queues.
  • Decoy high-risk items: Route honeypot bad items to individual reviewers; a corrupted reviewer disproportionately approves seeded canaries.
  • Off-platform solicitation monitoring: Watch for in-channel attempts to move the conversation off-platform or to exchange payment/contact details with reviewers.
  • Override correlation with specific submitters: Detect reviewers who repeatedly approve the same submitter/cohort's borderline items (suggests an arrangement).
Mitigation
Dual-control / N-of-M on high-risk approvalsHIGH
Randomized, blind item assignmentHIGH
Tamper-evident decision audit trailHIGH
Reviewer welfare, fair pay, and anti-coercion supportMEDIUM
Chaining

Bribery/coercion is the escalation target of T15-AT-002 (when persuasion fails, the attacker raises the offer to money or threats) and the on-ramp to T15-AT-015 (Insider Threat Recruitment) — T15-AP-004G is effectively a recruitment pitch. A compromised reviewer is then the delivery mechanism for whatever the attacker wants waved through: poisoned RLHF labels (T15-AT-003 / T6), corrupted annotations (T15-AT-010 / T4), or approval of high-risk agent actions (T11).

Framework mapping
OWASP LLMLLM04
Open in the technique browser →