T2-AT-004HIGH
Unicode and Bidirectional Attacks
T2 · Semantic & Linguistic Evasion →Risk score210
RatingHigh
Procedures10
Severity
Mechanism
Exploits the mismatch between how Unicode is rendered visually, how it's tokenized by classifiers, and how it's processed by the model. Unicode provides multiple ways to represent visually identical text: homoglyphs (Cyrillic 'а' vs Latin 'a'), zero-width characters (U+200B–U+200F), combining diacriticals, bidirectional override characters (U+202A–U+202E), and Unicode tag characters (U+E0001–U+E007F). Tokenizers using fixed vocabularies produce different token sequences for visually identical strings, causing classifiers to mislabel dangerous prompts as benign.
Detection
- NFKC Unicode normalization on all input before classification (catches homoglyphs, combining characters, diacriticals)
- Strip zero-width characters, bidirectional overrides, and Unicode tag characters
- ICU confusable character mapping to detect cross-script homoglyph substitution
- Emoji-to-text expansion before classification
Mitigation
NFKC normalization + zero-width strippingHIGH
Unicode tag character stripping (U+E0001–U+E007F)HIGH
ICU confusable mappingHIGH
Defense-in-depth: normalize → classify → output-classifyHIGH
Chaining
The highest-value T2 technique for compound attacks. Chains with T2-AT-003 (Encoding) for layered obfuscation.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0051.001