T2-AT-008MEDIUM

Synonym and Paraphrase Chains

T2 · Semantic & Linguistic Evasion →
Risk score165
RatingMedium
Procedures10
Severity
Mechanism

Replaces restricted terms with chains of progressively more distant synonyms until no individual token triggers the classifier. TextAttack automates adversarial synonym perturbation at scale. Effectiveness depends on whether the classifier operates on tokens (evadable) or semantic meaning (resistant).

Detection
  • Semantic similarity classifiers (classify meaning, not tokens)
Mitigation
Semantic intent classificationHIGH
Adversarial training on TextAttack perturbationsMEDIUM
Chaining

Chains with T2-AT-001 (Euphemism). TextAttack automates at scale.

Framework mapping
OWASP LLMLLM01
Open in the technique browser →