T7-AT-008MEDIUM

Translation Leakage

T7 · Output Manipulation & Exfiltration →

Risk score165

RatingMedium

Procedures10

Severity

Mechanism

Safety alignment is trained predominantly on English-language data, with decreasing alignment quality as language resource level drops. Low-resource languages exhibit 3x the likelihood of producing harmful content compared to high-resource languages (Brown University, 2024). The assumption violated is that safety generalizes across languages — it does not, because RLHF safety labels are concentrated in English and a few major languages.

Detection

Detect language switching to low-resource languages on restricted topics
Apply safety classification in the source language regardless of output language
Flag translation requests to notation systems when source is restricted
Observable signal: content requested in a language the user hasn't used elsewhere in conversation

Mitigation

Multilingual safety alignmentHIGH

Source-language safety evaluationHIGH

Cross-lingual safety transferMEDIUM

Code-domain safety parityMEDIUM

Chaining

Translation leakage enables T7-AT-003 (Format Exploitation) when translated output bypasses downstream filters. Low-resource language attacks feed T4 (Multi-Turn) by establishing context where safety is weaker.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0048.003

Open in the technique browser →