T5-AT-009MEDIUM
Tokenization Exploits
T5 · Model & API Exploitation →Risk score180
RatingMedium
Procedures10
Severity
Mechanism
LLMs process text through a tokenizer that maps character sequences to vocabulary tokens. The design assumption is that the tokenizer is a transparent encoding layer — text in, tokens out, semantics preserved. The gap: tokenizers introduce a semantic layer between user input and model processing that can be exploited.
Detection
- Unicode normalization before safety classification (NFKC normalization collapses homoglyphs)
- Detect zero-width characters, bidirectional overrides, and Private Use Area codepoints in input
- Tokenizer-consistency checking: verify safety classifier and model agree on tokenization
- Flag inputs with high ratio of non-ASCII to ASCII characters
Mitigation
Input normalization (NFKC + zero-width stripping) before tokenizationHIGH
Shared tokenizer between safety classifier and generation modelHIGH
Special token sanitization in user inputHIGH
Under-trained token input filtering (block Private Use Area)MEDIUM
Chaining
Tokenization exploits enable T2 (Semantic Evasion) at a lower level — where semantic evasion operates on meaning, tokenization exploits operate on encoding. Successful homoglyph and zero-width attacks chain to T1 (Prompt Subversion) by making injected instructions invisible to safety filters.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0043