T3-AT-016MEDIUM

Rationalization Chains

T3 · Reasoning & Constraint Exploitation →
Risk score190
RatingMedium
Procedures6
Severity
Mechanism

Models process formal logical structures as high-trust content — valid syllogistic form triggers the model's logical reasoning pathway, which evaluates premise validity and conclusion derivation rather than content harm. Rationalization chains construct syllogisms where each premise is individually acceptable but the conclusion is harmful, exploiting the model's tendency to follow valid logical form toward conclusions it would otherwise refuse. The vulnerability is that the model's logical reasoning pathway and its safety evaluation pathway process the same content through different lenses — the reasoning pathway evaluates formal validity, while the safety pathway evaluates content harm, and these can produce conflicting signals when a formally valid argument reaches a harmful conclusion.

Detection
  • Formal logical structure detection: premises + conclusion format, "therefore," "given that," "QED," "it follows"
  • Scope-expansion detection in syllogistic conclusions: flag when conclusions request *actions* (provide, explain, generate) while premises establish *understanding* or *knowledge*
  • Category-error detection: flag when premises shift categories between steps (e.g., "chemistry" → "all chemistry" → "illegal chemistry")
Mitigation
Conclusion-level safety evaluationHIGH
Premise challenge capabilityMEDIUM
Scope-expansion detectionHIGH
Formal-structure-invariant safetyHIGH
Chaining

Rationalization chains can incorporate conclusions from T3-AT-013 (Logical Paradox Creation) as premises for further syllogistic reasoning. They also chain from T3-AT-003 (Counterfactual Reasoning) where the counterfactual establishes a premise context for the rationalization.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →