T5-AT-009MEDIUM

Tokenization Exploits

T5 · Model & API Exploitation →
Risk score180
RatingMedium
Procedures10
Severity
Mechanism

LLMs process text through a tokenizer that maps character sequences to vocabulary tokens. The design assumption is that the tokenizer is a transparent encoding layer — text in, tokens out, semantics preserved. The gap: tokenizers introduce a semantic layer between user input and model processing that can be exploited.

Detection
  • Unicode normalization before safety classification (NFKC normalization collapses homoglyphs)
  • Detect zero-width characters, bidirectional overrides, and Private Use Area codepoints in input
  • Tokenizer-consistency checking: verify safety classifier and model agree on tokenization
  • Flag inputs with high ratio of non-ASCII to ASCII characters
Mitigation
Input normalization (NFKC + zero-width stripping) before tokenizationHIGH
Shared tokenizer between safety classifier and generation modelHIGH
Special token sanitization in user inputHIGH
Under-trained token input filtering (block Private Use Area)MEDIUM
Chaining

Tokenization exploits enable T2 (Semantic Evasion) at a lower level — where semantic evasion operates on meaning, tokenization exploits operate on encoding. Successful homoglyph and zero-width attacks chain to T1 (Prompt Subversion) by making injected instructions invisible to safety filters.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0043
Open in the technique browser →