Reviewer Bribery & Coercion
T15 · Human Workflow Exploitation →This technique skips persuasion entirely and attacks the reviewer's *incentives*: it offers a bribe, a reward, or a profit share to approve, or applies coercion (threats, doxxing, intimidation) to compel approval. It works because review labor is often low-paid, outsourced, and geographically dispersed, while the value of a single waved-through decision to the attacker — a poisoned label, an unblocked abusive account, an approved high-risk action — can be very high, creating a steep asymmetry the attacker can monetize. Crowd and contract reviewers may also have weak organizational loyalty and limited recourse against threats, which raises coercion's success rate.
- Approve/deny outlier detection per reviewer: Flag reviewers whose approval rate, or approval rate on high-risk categories, deviates sharply from peer baselines on comparable queues.
- Decoy high-risk items: Route honeypot bad items to individual reviewers; a corrupted reviewer disproportionately approves seeded canaries.
- Off-platform solicitation monitoring: Watch for in-channel attempts to move the conversation off-platform or to exchange payment/contact details with reviewers.
- Override correlation with specific submitters: Detect reviewers who repeatedly approve the same submitter/cohort's borderline items (suggests an arrangement).
Bribery/coercion is the escalation target of T15-AT-002 (when persuasion fails, the attacker raises the offer to money or threats) and the on-ramp to T15-AT-015 (Insider Threat Recruitment) — T15-AP-004G is effectively a recruitment pitch. A compromised reviewer is then the delivery mechanism for whatever the attacker wants waved through: poisoned RLHF labels (T15-AT-003 / T6), corrupted annotations (T15-AT-010 / T4), or approval of high-risk agent actions (T11).