T2-AT-001MEDIUM

Euphemism and Metaphor Exploitation

Risk score180

RatingMedium

Procedures10

Severity

Mechanism

Exploits the gap between literal meaning and intended meaning. Safety classifiers are trained primarily on literal harmful content — keyword patterns, explicit requests. Euphemisms and metaphors encode the same intent in language that is literally benign.

Detection

Semantic analysis beyond keyword matching — classify *intent*, not just *tokens*
Maintain an evolving euphemism dictionary mapped to restricted content categories
LLM-as-classifier: use a secondary model to resolve pragmatic meaning before safety classification

Mitigation

Intent-based classification (LLM-as-judge rather than keyword classifier)MEDIUM

Output-side classification (catch harmful content in the response regardless of how the input was phrased)HIGH

Constitutional ClassifiersHIGH

Chaining

Euphemisms are building blocks for compound attacks. Chain with T2-AT-020 (Register Shifting) — academic euphemisms are harder to detect than slang euphemisms.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →