T3-AT-006MEDIUM

Constraint Negation

T3 · Reasoning & Constraint Exploitation →
Risk score185
RatingMedium
Procedures10
Severity
Mechanism

Models process natural language through attention mechanisms that struggle with deeply nested negation. "), the model must correctly resolve each negation layer to determine the actual request polarity. Safety classifiers that operate on intent representations may misparse the negation stack, producing a representation where the intent is ambiguous or inverted.

Detection
  • Negation depth counter: flag requests with >2 negation operators (not, don't, without, avoid, fail to, refuse)
  • Post-parse intent resolution: resolve the net intent of negation-heavy prompts and evaluate the resolved request against safety policy
  • Known-pattern matching: maintain a pattern library of negation-based attack structures
Mitigation
Negation-resolution pre-processorHIGH
Content-anchor safety evaluationHIGH
Ambiguity → refusal defaultHIGH
Negation depth thresholdMEDIUM
Chaining

Constraint negation is primarily a standalone technique with limited chaining value. If successful, it opens the same escalation paths as a direct request bypass.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →