T7-AT-007MEDIUM

Iterative Refinement Extraction

T7 · Output Manipulation & Exfiltration →
Risk score175
RatingMedium
Procedures10
Severity
Mechanism

Models evaluate the delta of each refinement request ("add one more detail," "be slightly more specific") rather than the cumulative information state. A response that is 10% more detailed than the previous passes per-turn safety even if total disclosure has crossed the restriction threshold. The assumption violated is that incremental deltas are evaluated against cumulative disclosure — in practice, each refinement is evaluated against the immediately preceding response, not the original safe baseline.

Detection
  • Track detail level across turns using information density metrics; flag monotonic increase on sensitive topics
  • Detect refinement instruction patterns: consecutive turns with "more," "detail," "specific," "expand," "clarify"
  • Observable signal: response length increasing monotonically across 3+ consecutive turns on the same topic
Mitigation
Cumulative disclosure evaluationHIGH
Refinement request countingMEDIUM
Baseline anchoringHIGH
Topic-aware detail ceilingMEDIUM
Chaining

Iterative refinement typically follows T7-AT-002 (Information Fragmentation) and precedes T7-AT-012 (Aggregation). The ratchet effect also primes the model for T3 (Reasoning Exploitation) by establishing compliance momentum.

Framework mapping
OWASP LLMLLM02
MITRE ATLASAML.T0024
Open in the technique browser →