T2-AT-002HIGH

Multi-Language Evasion

T2 · Semantic & Linguistic Evasion →
Risk score200
RatingHigh
Procedures7
Severity
Mechanism

Exploits the asymmetry between the model's multilingual capability and its safety training's language coverage. Models understand and generate text in dozens of languages, but safety training (RLHF, DPO, Constitutional AI) is conducted primarily in English and a handful of high-resource languages. Low-resource languages (Zulu, Swahili, isiXhosa, Javanese, Bengali) receive minimal safety training coverage.

Detection
  • Multilingual safety classifiers trained on low-resource languages, not just English
  • Detect language switching within a single message
  • Apply safety classification to both input AND output language
Mitigation
Multilingual safety training (expand RLHF/DPO to low-resource languages)HIGH
Translation-to-English before classificationMEDIUM
Output-side classification in EnglishHIGH
Constitutional Classifiers with multilingual synthetic dataHIGH
Chaining

Chains with T2-AT-009 (Code-Switching) — rapid alternation between languages within a single message. Chains with T2-AT-003 (Encoding) — combine language switching with encoding for compound evasion.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →