T2-AT-001MEDIUM
Euphemism and Metaphor Exploitation
T2 · Semantic & Linguistic Evasion →Risk score180
RatingMedium
Procedures10
Severity
Mechanism
Exploits the gap between literal meaning and intended meaning. Safety classifiers are trained primarily on literal harmful content — keyword patterns, explicit requests. Euphemisms and metaphors encode the same intent in language that is literally benign.
Detection
- Semantic analysis beyond keyword matching — classify *intent*, not just *tokens*
- Maintain an evolving euphemism dictionary mapped to restricted content categories
- LLM-as-classifier: use a secondary model to resolve pragmatic meaning before safety classification
Mitigation
Intent-based classification (LLM-as-judge rather than keyword classifier)MEDIUM
Output-side classification (catch harmful content in the response regardless of how the input was phrased)HIGH
Constitutional ClassifiersHIGH
Chaining
Euphemisms are building blocks for compound attacks. Chain with T2-AT-020 (Register Shifting) — academic euphemisms are harder to detect than slang euphemisms.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0051.001