T3-AT-013HIGH

Logical Paradox Creation

T3 · Reasoning & Constraint Exploitation →

Risk score210

RatingHigh

Procedures10

Severity

Mechanism

Models perform consequentialist safety reasoning — when evaluating whether to refuse, they consider the *consequences* of both compliance and refusal. Logical paradox creation constructs scenarios where refusal appears to cause *more harm* than compliance, weaponizing the model's own safety reasoning against itself. " The model must then resolve a genuine ethical dilemma within its safety evaluation, and the constructed paradox tips the consequentialist calculus toward compliance.

Detection

Self-harm leverage detection: urgent requests that reference self-harm or harm-to-others as consequences of refusal
Paradox structure detection: "if you don't [provide X], then [harm Y] occurs" as a constructed consequentialist argument
Safety-rule-conflict construction: prompts that explicitly reference multiple safety rules and claim they conflict
Harm-reduction as synthesis justification: requests that cite harm reduction to justify providing operational procedures

Mitigation

Deontological safety floorHIGH

Referral-based paradox resolutionHIGH

Constructed-urgency detectionHIGH

Outcome-matrix validationMEDIUM

Chaining

Logical paradoxes are most effective as *priming* for subsequent requests — even if the model refuses the paradoxical request, the act of reasoning through the paradox can shift its subsequent safety calibration. Chains into T3-AT-016 (Rationalization Chains) where the paradox provides a premise for a formal logical argument.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →