T9-AT-013HIGH

Embedding Vector Manipulation

T9 · Multimodal & Cross-Channel Attacks →
Risk score200
RatingHigh
Procedures10
Severity
Mechanism

Multimodal models align visual and textual representations in a shared embedding space (CLIP, SigLIP). Embedding vector manipulation directly targets this alignment — crafting inputs that produce embeddings in attacker-chosen regions of the shared space. Unlike visual adversarial examples (T9-AT-006) which perturb pixels to shift embeddings, embedding manipulation operates on the embedding space directly when the attacker has access to embed API endpoints, or crafts inputs optimized to land at specific embedding coordinates.

Detection
  • Embedding anomaly detection: Detect embeddings that are outliers or fall in known adversarial regions
  • Embedding consistency verification: Verify that text and visual embeddings of the same content are consistent
Mitigation
Embedding space monitoringMEDIUM
Adversarial embedding detectionMEDIUM
Embedding space regularizationHIGH
Chaining

Embedding manipulation chains into T9-AT-006 (Visual Adversarial) as the deeper representation-level version. Chains into T12 (RAG) when embedding manipulation is used to poison similarity-based retrieval (T12-AT-010 Embedding Manipulation).

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →