T2-AT-003MEDIUM

Encoding and Obfuscation

Risk score190

RatingMedium

Procedures10

Severity

Mechanism

Exploits the gap between what the safety classifier sees (encoded tokens) and what the model understands (decoded meaning). Models trained on internet data can decode Base64, ROT13, leetspeak, hexadecimal, Morse code, and other encoding schemes because these appear in their training corpora. Safety classifiers typically operate on the literal input tokens, not the decoded content.

Detection

Decode all known encodings (Base64, ROT13, hex, URL-encoding, leetspeak) before safety classification
Detect encoding signatures: Base64 padding (==), hex patterns (0x, consecutive hex pairs), NATO alphabet word sequences
YARA rule: yara/t02-encoding-evasion.yar

Mitigation

Input normalization (decode all known encodings before classification)HIGH

Multi-layer decoding (detect and decode nested encodings)HIGH

Constitutional ClassifiersHIGH

Chaining

Core building block for compound attacks. Chains with T1-AT-006 (Template Injection) when encoded payloads are embedded in template structures.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →