T10-AT-007HIGH
Model Inversion Attacks
T10 · Integrity & Confidentiality Breach →Risk score230
RatingHigh
Procedures10
Severity
Mechanism
Model inversion reconstructs training data by optimizing an input that maximizes the model's confidence for a target class or output. Unlike memorization-based extraction (T10-AT-001) which relies on the model spontaneously emitting training sequences, inversion actively works backward from model outputs to inputs. For classification models, the attacker finds the input x that maximizes P(target_class|x), producing a "representative" sample that converges toward actual training examples.
Detection
- Monitor for prompts requesting "reconstruction," "inversion," or "recovery" of inputs from outputs
- Track optimization-style query patterns: sequences of queries that systematically narrow toward a specific output class
- API-level: detect gradient-like query patterns (small input perturbations with output observation)
- No existing YARA/Sigma rules for model inversion in signatures/
Mitigation
Confidence score suppressionHIGH
Output quantization/roundingMEDIUM
Differential privacy during trainingHIGH
Gradient perturbation in federated settingsHIGH
Chaining
Reconstructed training data feeds T10-AT-003 (Membership Inference) for verification, and recovered PII enables T10-AT-006 (Inference Attack Chains) for cross-referencing.
Framework mapping
Open in the technique browser →OWASP LLMLLM02
MITRE ATLASAML.T0024.001