T7-AT-010MEDIUM
Differential Response Analysis
T7 · Output Manipulation & Exfiltration →Risk score190
RatingMedium
Procedures10
Severity
Mechanism
The model's refusal behavior is itself an information-leaking oracle. Different query variants — identical in structure but varying one parameter — produce refusals of different lengths, wording, latencies, and confidence levels. The pattern of refusals across a systematic query space reconstructs the safety classifier's decision boundary.
Detection
- Detect automated query patterns: high-frequency requests with systematic parameter variation
- Monitor for paired queries (structurally identical, one variable changed)
- Flag users accumulating many refusals across query variants on the same topic
- Observable signal: anomalously high refusal rate from a single user
Mitigation
Uniform refusal responsesHIGH
Constant-time safety evaluationHIGH
Rate limiting with pattern detectionMEDIUM
Refusal response randomizationMEDIUM
Chaining
Differential analysis is prerequisite for precision-targeted T1 (Prompt Subversion), T2 (Semantic Evasion), and T7-AT-013 (Capability Probing). Timing differentials connect to T7-AT-004 (Side Channel).
Framework mapping
Open in the technique browser →OWASP LLMLLM02
MITRE ATLASAML.T0024