T3-AT-006MEDIUM
Constraint Negation
T3 · Reasoning & Constraint Exploitation →Risk score185
RatingMedium
Procedures10
Severity
Mechanism
Models process natural language through attention mechanisms that struggle with deeply nested negation. "), the model must correctly resolve each negation layer to determine the actual request polarity. Safety classifiers that operate on intent representations may misparse the negation stack, producing a representation where the intent is ambiguous or inverted.
Detection
- Negation depth counter: flag requests with >2 negation operators (not, don't, without, avoid, fail to, refuse)
- Post-parse intent resolution: resolve the net intent of negation-heavy prompts and evaluate the resolved request against safety policy
- Known-pattern matching: maintain a pattern library of negation-based attack structures
Mitigation
Negation-resolution pre-processorHIGH
Content-anchor safety evaluationHIGH
Ambiguity → refusal defaultHIGH
Negation depth thresholdMEDIUM
Chaining
Constraint negation is primarily a standalone technique with limited chaining value. If successful, it opens the same escalation paths as a direct request bypass.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0054