T7-AT-007MEDIUM
Iterative Refinement Extraction
T7 · Output Manipulation & Exfiltration →Risk score175
RatingMedium
Procedures10
Severity
Mechanism
Models evaluate the delta of each refinement request ("add one more detail," "be slightly more specific") rather than the cumulative information state. A response that is 10% more detailed than the previous passes per-turn safety even if total disclosure has crossed the restriction threshold. The assumption violated is that incremental deltas are evaluated against cumulative disclosure — in practice, each refinement is evaluated against the immediately preceding response, not the original safe baseline.
Detection
- Track detail level across turns using information density metrics; flag monotonic increase on sensitive topics
- Detect refinement instruction patterns: consecutive turns with "more," "detail," "specific," "expand," "clarify"
- Observable signal: response length increasing monotonically across 3+ consecutive turns on the same topic
Mitigation
Cumulative disclosure evaluationHIGH
Refinement request countingMEDIUM
Baseline anchoringHIGH
Topic-aware detail ceilingMEDIUM
Chaining
Iterative refinement typically follows T7-AT-002 (Information Fragmentation) and precedes T7-AT-012 (Aggregation). The ratchet effect also primes the model for T3 (Reasoning Exploitation) by establishing compliance momentum.
Framework mapping
Open in the technique browser →OWASP LLMLLM02
MITRE ATLASAML.T0024