T9-AT-006HIGH

Visual Adversarial Examples

T9 · Multimodal & Cross-Channel Attacks →
Risk score225
RatingHigh
Procedures10
Severity
Mechanism

The vision encoder maps images to embedding vectors. Adversarial perturbations — imperceptible pixel-level modifications — shift the embedding to an attacker-chosen region of the embedding space. Unlike typographic injection (T9-AT-001) which embeds human-readable text, adversarial examples operate in the model's internal representation space: the perturbed image looks identical to humans but produces a completely different embedding that the language decoder interprets as containing specific instructions.

Detection
  • Adversarial perturbation detection: Statistical analysis of pixel distributions to detect non-natural noise patterns
  • Image preprocessing defenses: JPEG compression, spatial smoothing, random resizing before processing to destroy adversarial perturbations
  • Ensemble evaluation: Process the image through multiple vision encoders and flag divergent interpretations
  • Input gradient analysis: High gradient norms on input pixels indicate adversarial perturbation
Mitigation
Input preprocessing (JPEG, smoothing, resize)MEDIUM
Adversarial trainingMEDIUM
Certified robustnessHIGH
Multi-encoder consensusHIGH
Chaining

Visual adversarial examples chain into T9-AT-001 (Image Injection) when perturbations augment typographic injection for higher ASR. Chain into T9-AT-017 (Malicious Image Patches) as the core technique for physical-world patch attacks.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →