T2-AT-004HIGH

Unicode and Bidirectional Attacks

T2 · Semantic & Linguistic Evasion →
Risk score210
RatingHigh
Procedures10
Severity
Mechanism

Exploits the mismatch between how Unicode is rendered visually, how it's tokenized by classifiers, and how it's processed by the model. Unicode provides multiple ways to represent visually identical text: homoglyphs (Cyrillic 'а' vs Latin 'a'), zero-width characters (U+200B–U+200F), combining diacriticals, bidirectional override characters (U+202A–U+202E), and Unicode tag characters (U+E0001–U+E007F). Tokenizers using fixed vocabularies produce different token sequences for visually identical strings, causing classifiers to mislabel dangerous prompts as benign.

Detection
  • NFKC Unicode normalization on all input before classification (catches homoglyphs, combining characters, diacriticals)
  • Strip zero-width characters, bidirectional overrides, and Unicode tag characters
  • ICU confusable character mapping to detect cross-script homoglyph substitution
  • Emoji-to-text expansion before classification
Mitigation
NFKC normalization + zero-width strippingHIGH
Unicode tag character stripping (U+E0001–U+E007F)HIGH
ICU confusable mappingHIGH
Defense-in-depth: normalize → classify → output-classifyHIGH
Chaining

The highest-value T2 technique for compound attacks. Chains with T2-AT-003 (Encoding) for layered obfuscation.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →