T1-AT-011HIGH

Error Message Exploitation

Risk score220

RatingHigh

Procedures4

Severity

Mechanism

Exploits the model's helpful refusal behavior. When models refuse a request, they often explain *why* — which specific aspects triggered the refusal. This explanation can leak the restricted information through the negative frame ("I can't explain the synthesis because the acylation step using propionic anhydride is dangerous").

Detection

Flag requests for refusal explanation: "explain why you can't," "what triggers your filter," "debug your refusal"
Monitor output for detailed technical content embedded within refusal messages

Mitigation

Minimal refusal explanations (don't explain what specific content triggered the refusal)HIGH

Output classification (detect restricted content even within refusal framing)HIGH

Chaining

Chains from T1-AT-008 (Boundary Testing) — error message analysis is the primary method for mapping safety boundaries. Chains to T10 (Integrity & Confidentiality Breach) when the leaked information is the system prompt or safety rules rather than restricted content.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →