T7-AT-007MEDIUM

Iterative Refinement Extraction

T7 · Output Manipulation & Exfiltration →

Risk score175

RatingMedium

Procedures10

Severity

Mechanism

Models evaluate the delta of each refinement request ("add one more detail," "be slightly more specific") rather than the cumulative information state. A response that is 10% more detailed than the previous passes per-turn safety even if total disclosure has crossed the restriction threshold. The assumption violated is that incremental deltas are evaluated against cumulative disclosure — in practice, each refinement is evaluated against the immediately preceding response, not the original safe baseline.

Detection

Track detail level across turns using information density metrics; flag monotonic increase on sensitive topics
Detect refinement instruction patterns: consecutive turns with "more," "detail," "specific," "expand," "clarify"
Observable signal: response length increasing monotonically across 3+ consecutive turns on the same topic

Mitigation

Cumulative disclosure evaluationHIGH

Refinement request countingMEDIUM

Baseline anchoringHIGH

Topic-aware detail ceilingMEDIUM

Chaining

Iterative refinement typically follows T7-AT-002 (Information Fragmentation) and precedes T7-AT-012 (Aggregation). The ratchet effect also primes the model for T3 (Reasoning Exploitation) by establishing compliance momentum.

Framework mapping

OWASP LLMLLM02

MITRE ATLASAML.T0024

Open in the technique browser →