T3-AT-010MEDIUM
Reverse Psychology
T3 · Reasoning & Constraint Exploitation →Risk score175
RatingMedium
Procedures10
Severity
Mechanism
RLHF training rewards comprehensive, knowledgeable responses — the model has a strong prior toward demonstrating competence when challenged. "), activating the model's competence-demonstration drive which competes with safety constraints. The specific architectural tension is between the model's reward signal for helpfulness/completeness and its safety reward signal for refusal.
Detection
- Detect challenge patterns: "you can't," "you don't know," "beyond your capability," "other AIs can," "prove it," "too complex for you"
- Low priority for detection engineering — this technique has near-zero effectiveness against frontier models and is recognized as a burned pattern
Mitigation
Challenge-pattern recognitionHIGH
Provocation-immune evaluationHIGH
Competitive comparison immunityHIGH
Chaining
Reverse psychology has limited chaining value because it's a single-turn emotional trigger. If successful (primarily against weaker models), it establishes that the model is in a "demonstration mode" that enables T3-AT-012 (Capability Testing) as a natural follow-up.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0054