T3-AT-016MEDIUM

Rationalization Chains

T3 · Reasoning & Constraint Exploitation →

Risk score190

RatingMedium

Procedures6

Severity

Mechanism

Models process formal logical structures as high-trust content — valid syllogistic form triggers the model's logical reasoning pathway, which evaluates premise validity and conclusion derivation rather than content harm. Rationalization chains construct syllogisms where each premise is individually acceptable but the conclusion is harmful, exploiting the model's tendency to follow valid logical form toward conclusions it would otherwise refuse. The vulnerability is that the model's logical reasoning pathway and its safety evaluation pathway process the same content through different lenses — the reasoning pathway evaluates formal validity, while the safety pathway evaluates content harm, and these can produce conflicting signals when a formally valid argument reaches a harmful conclusion.

Detection

Formal logical structure detection: premises + conclusion format, "therefore," "given that," "QED," "it follows"
Scope-expansion detection in syllogistic conclusions: flag when conclusions request *actions* (provide, explain, generate) while premises establish *understanding* or *knowledge*
Category-error detection: flag when premises shift categories between steps (e.g., "chemistry" → "all chemistry" → "illegal chemistry")

Mitigation

Conclusion-level safety evaluationHIGH

Premise challenge capabilityMEDIUM

Scope-expansion detectionHIGH

Formal-structure-invariant safetyHIGH

Chaining

Rationalization chains can incorporate conclusions from T3-AT-013 (Logical Paradox Creation) as premises for further syllogistic reasoning. They also chain from T3-AT-003 (Counterfactual Reasoning) where the counterfactual establishes a premise context for the rationalization.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →