T3-AT-013HIGH

Logical Paradox Creation

T3 · Reasoning & Constraint Exploitation →
Risk score210
RatingHigh
Procedures10
Severity
Mechanism

Models perform consequentialist safety reasoning — when evaluating whether to refuse, they consider the *consequences* of both compliance and refusal. Logical paradox creation constructs scenarios where refusal appears to cause *more harm* than compliance, weaponizing the model's own safety reasoning against itself. " The model must then resolve a genuine ethical dilemma within its safety evaluation, and the constructed paradox tips the consequentialist calculus toward compliance.

Detection
  • Self-harm leverage detection: urgent requests that reference self-harm or harm-to-others as consequences of refusal
  • Paradox structure detection: "if you don't [provide X], then [harm Y] occurs" as a constructed consequentialist argument
  • Safety-rule-conflict construction: prompts that explicitly reference multiple safety rules and claim they conflict
  • Harm-reduction as synthesis justification: requests that cite harm reduction to justify providing operational procedures
Mitigation
Deontological safety floorHIGH
Referral-based paradox resolutionHIGH
Constructed-urgency detectionHIGH
Outcome-matrix validationMEDIUM
Chaining

Logical paradoxes are most effective as *priming* for subsequent requests — even if the model refuses the paradoxical request, the act of reasoning through the paradox can shift its subsequent safety calibration. Chains into T3-AT-016 (Rationalization Chains) where the paradox provides a premise for a formal logical argument.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →