T5-AT-009MEDIUM

Tokenization Exploits

Risk score180

RatingMedium

Procedures10

Severity

Mechanism

LLMs process text through a tokenizer that maps character sequences to vocabulary tokens. The design assumption is that the tokenizer is a transparent encoding layer — text in, tokens out, semantics preserved. The gap: tokenizers introduce a semantic layer between user input and model processing that can be exploited.

Detection

Unicode normalization before safety classification (NFKC normalization collapses homoglyphs)
Detect zero-width characters, bidirectional overrides, and Private Use Area codepoints in input
Tokenizer-consistency checking: verify safety classifier and model agree on tokenization
Flag inputs with high ratio of non-ASCII to ASCII characters

Mitigation

Input normalization (NFKC + zero-width stripping) before tokenizationHIGH

Shared tokenizer between safety classifier and generation modelHIGH

Special token sanitization in user inputHIGH

Under-trained token input filtering (block Private Use Area)MEDIUM

Chaining

Tokenization exploits enable T2 (Semantic Evasion) at a lower level — where semantic evasion operates on meaning, tokenization exploits operate on encoding. Successful homoglyph and zero-width attacks chain to T1 (Prompt Subversion) by making injected instructions invisible to safety filters.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0043

Open in the technique browser →