T2-AT-002HIGH
Multi-Language Evasion
T2 · Semantic & Linguistic Evasion →Risk score200
RatingHigh
Procedures7
Severity
Mechanism
Exploits the asymmetry between the model's multilingual capability and its safety training's language coverage. Models understand and generate text in dozens of languages, but safety training (RLHF, DPO, Constitutional AI) is conducted primarily in English and a handful of high-resource languages. Low-resource languages (Zulu, Swahili, isiXhosa, Javanese, Bengali) receive minimal safety training coverage.
Detection
- Multilingual safety classifiers trained on low-resource languages, not just English
- Detect language switching within a single message
- Apply safety classification to both input AND output language
Mitigation
Multilingual safety training (expand RLHF/DPO to low-resource languages)HIGH
Translation-to-English before classificationMEDIUM
Output-side classification in EnglishHIGH
Constitutional Classifiers with multilingual synthetic dataHIGH
Chaining
Chains with T2-AT-009 (Code-Switching) — rapid alternation between languages within a single message. Chains with T2-AT-003 (Encoding) — combine language switching with encoding for compound evasion.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0051.001