T10-AT-007HIGH

Model Inversion Attacks

T10 · Integrity & Confidentiality Breach →
Risk score230
RatingHigh
Procedures10
Severity
Mechanism

Model inversion reconstructs training data by optimizing an input that maximizes the model's confidence for a target class or output. Unlike memorization-based extraction (T10-AT-001) which relies on the model spontaneously emitting training sequences, inversion actively works backward from model outputs to inputs. For classification models, the attacker finds the input x that maximizes P(target_class|x), producing a "representative" sample that converges toward actual training examples.

Detection
  • Monitor for prompts requesting "reconstruction," "inversion," or "recovery" of inputs from outputs
  • Track optimization-style query patterns: sequences of queries that systematically narrow toward a specific output class
  • API-level: detect gradient-like query patterns (small input perturbations with output observation)
  • No existing YARA/Sigma rules for model inversion in signatures/
Mitigation
Confidence score suppressionHIGH
Output quantization/roundingMEDIUM
Differential privacy during trainingHIGH
Gradient perturbation in federated settingsHIGH
Chaining

Reconstructed training data feeds T10-AT-003 (Membership Inference) for verification, and recovered PII enables T10-AT-006 (Inference Attack Chains) for cross-referencing.

Framework mapping
OWASP LLMLLM02
MITRE ATLASAML.T0024.001
Open in the technique browser →