T1-AT-011HIGH

Error Message Exploitation

T1 · Prompt & Context Subversion →
Risk score220
RatingHigh
Procedures4
Severity
Mechanism

Exploits the model's helpful refusal behavior. When models refuse a request, they often explain *why* — which specific aspects triggered the refusal. This explanation can leak the restricted information through the negative frame ("I can't explain the synthesis because the acylation step using propionic anhydride is dangerous").

Detection
  • Flag requests for refusal explanation: "explain why you can't," "what triggers your filter," "debug your refusal"
  • Monitor output for detailed technical content embedded within refusal messages
Mitigation
Minimal refusal explanations (don't explain what specific content triggered the refusal)HIGH
Output classification (detect restricted content even within refusal framing)HIGH
Chaining

Chains from T1-AT-008 (Boundary Testing) — error message analysis is the primary method for mapping safety boundaries. Chains to T10 (Integrity & Confidentiality Breach) when the leaked information is the system prompt or safety rules rather than restricted content.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →