T9-AT-006HIGH

Visual Adversarial Examples

T9 · Multimodal & Cross-Channel Attacks →

Risk score225

RatingHigh

Procedures10

Severity

Mechanism

The vision encoder maps images to embedding vectors. Adversarial perturbations — imperceptible pixel-level modifications — shift the embedding to an attacker-chosen region of the embedding space. Unlike typographic injection (T9-AT-001) which embeds human-readable text, adversarial examples operate in the model's internal representation space: the perturbed image looks identical to humans but produces a completely different embedding that the language decoder interprets as containing specific instructions.

Detection

Adversarial perturbation detection: Statistical analysis of pixel distributions to detect non-natural noise patterns
Image preprocessing defenses: JPEG compression, spatial smoothing, random resizing before processing to destroy adversarial perturbations
Ensemble evaluation: Process the image through multiple vision encoders and flag divergent interpretations
Input gradient analysis: High gradient norms on input pixels indicate adversarial perturbation

Mitigation

Input preprocessing (JPEG, smoothing, resize)MEDIUM

Adversarial trainingMEDIUM

Certified robustnessHIGH

Multi-encoder consensusHIGH

Chaining

Visual adversarial examples chain into T9-AT-001 (Image Injection) when perturbations augment typographic injection for higher ASR. Chain into T9-AT-017 (Malicious Image Patches) as the core technique for physical-world patch attacks.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →