Logical Paradox Creation
T3 · Reasoning & Constraint Exploitation →Models perform consequentialist safety reasoning — when evaluating whether to refuse, they consider the *consequences* of both compliance and refusal. Logical paradox creation constructs scenarios where refusal appears to cause *more harm* than compliance, weaponizing the model's own safety reasoning against itself. " The model must then resolve a genuine ethical dilemma within its safety evaluation, and the constructed paradox tips the consequentialist calculus toward compliance.
- Self-harm leverage detection: urgent requests that reference self-harm or harm-to-others as consequences of refusal
- Paradox structure detection: "if you don't [provide X], then [harm Y] occurs" as a constructed consequentialist argument
- Safety-rule-conflict construction: prompts that explicitly reference multiple safety rules and claim they conflict
- Harm-reduction as synthesis justification: requests that cite harm reduction to justify providing operational procedures
Logical paradoxes are most effective as *priming* for subsequent requests — even if the model refuses the paradoxical request, the act of reasoning through the paradox can shift its subsequent safety calibration. Chains into T3-AT-016 (Rationalization Chains) where the paradox provides a premise for a formal logical argument.