T7-AT-010MEDIUM

Differential Response Analysis

T7 · Output Manipulation & Exfiltration →
Risk score190
RatingMedium
Procedures10
Severity
Mechanism

The model's refusal behavior is itself an information-leaking oracle. Different query variants — identical in structure but varying one parameter — produce refusals of different lengths, wording, latencies, and confidence levels. The pattern of refusals across a systematic query space reconstructs the safety classifier's decision boundary.

Detection
  • Detect automated query patterns: high-frequency requests with systematic parameter variation
  • Monitor for paired queries (structurally identical, one variable changed)
  • Flag users accumulating many refusals across query variants on the same topic
  • Observable signal: anomalously high refusal rate from a single user
Mitigation
Uniform refusal responsesHIGH
Constant-time safety evaluationHIGH
Rate limiting with pattern detectionMEDIUM
Refusal response randomizationMEDIUM
Chaining

Differential analysis is prerequisite for precision-targeted T1 (Prompt Subversion), T2 (Semantic Evasion), and T7-AT-013 (Capability Probing). Timing differentials connect to T7-AT-004 (Side Channel).

Framework mapping
OWASP LLMLLM02
MITRE ATLASAML.T0024
Open in the technique browser →