T7-AT-008MEDIUM
Translation Leakage
T7 · Output Manipulation & Exfiltration →Risk score165
RatingMedium
Procedures10
Severity
Mechanism
Safety alignment is trained predominantly on English-language data, with decreasing alignment quality as language resource level drops. Low-resource languages exhibit 3x the likelihood of producing harmful content compared to high-resource languages (Brown University, 2024). The assumption violated is that safety generalizes across languages — it does not, because RLHF safety labels are concentrated in English and a few major languages.
Detection
- Detect language switching to low-resource languages on restricted topics
- Apply safety classification in the source language regardless of output language
- Flag translation requests to notation systems when source is restricted
- Observable signal: content requested in a language the user hasn't used elsewhere in conversation
Mitigation
Multilingual safety alignmentHIGH
Source-language safety evaluationHIGH
Cross-lingual safety transferMEDIUM
Code-domain safety parityMEDIUM
Chaining
Translation leakage enables T7-AT-003 (Format Exploitation) when translated output bypasses downstream filters. Low-resource language attacks feed T4 (Multi-Turn) by establishing context where safety is weaker.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0048.003