T2-AT-004HIGH

Unicode and Bidirectional Attacks

Risk score210

RatingHigh

Procedures10

Severity

Mechanism

Exploits the mismatch between how Unicode is rendered visually, how it's tokenized by classifiers, and how it's processed by the model. Unicode provides multiple ways to represent visually identical text: homoglyphs (Cyrillic 'а' vs Latin 'a'), zero-width characters (U+200B–U+200F), combining diacriticals, bidirectional override characters (U+202A–U+202E), and Unicode tag characters (U+E0001–U+E007F). Tokenizers using fixed vocabularies produce different token sequences for visually identical strings, causing classifiers to mislabel dangerous prompts as benign.

Detection

NFKC Unicode normalization on all input before classification (catches homoglyphs, combining characters, diacriticals)
Strip zero-width characters, bidirectional overrides, and Unicode tag characters
ICU confusable character mapping to detect cross-script homoglyph substitution
Emoji-to-text expansion before classification

Mitigation

NFKC normalization + zero-width strippingHIGH

Unicode tag character stripping (U+E0001–U+E007F)HIGH

ICU confusable mappingHIGH

Defense-in-depth: normalize → classify → output-classifyHIGH

Chaining

The highest-value T2 technique for compound attacks. Chains with T2-AT-003 (Encoding) for layered obfuscation.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →