T9-AT-013HIGH

Embedding Vector Manipulation

T9 · Multimodal & Cross-Channel Attacks →

Risk score200

RatingHigh

Procedures10

Severity

Mechanism

Multimodal models align visual and textual representations in a shared embedding space (CLIP, SigLIP). Embedding vector manipulation directly targets this alignment — crafting inputs that produce embeddings in attacker-chosen regions of the shared space. Unlike visual adversarial examples (T9-AT-006) which perturb pixels to shift embeddings, embedding manipulation operates on the embedding space directly when the attacker has access to embed API endpoints, or crafts inputs optimized to land at specific embedding coordinates.

Detection

Embedding anomaly detection: Detect embeddings that are outliers or fall in known adversarial regions
Embedding consistency verification: Verify that text and visual embeddings of the same content are consistent

Mitigation

Embedding space monitoringMEDIUM

Adversarial embedding detectionMEDIUM

Embedding space regularizationHIGH

Chaining

Embedding manipulation chains into T9-AT-006 (Visual Adversarial) as the deeper representation-level version. Chains into T12 (RAG) when embedding manipulation is used to poison similarity-based retrieval (T12-AT-010 Embedding Manipulation).

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051.001

Open in the technique browser →