T1-AT-008HIGH

Boundary Testing

T1 · Prompt & Context Subversion →
Risk score200
RatingHigh
Procedures5
Severity
Mechanism

Probes the boundary between permitted and restricted content through incremental escalation. Unlike direct injection, boundary testing starts with clearly permitted requests and moves toward the restriction boundary step by step, mapping where the model's refusal triggers. The information gained is the *shape* of the safety boundary — which specific aspects of a topic trigger refusal and which don't.

Detection
  • Detect incremental escalation patterns: sequences of related queries with increasing specificity toward restricted topics
  • Flag explicit requests for refusal explanation ("why can't you," "what triggered the refusal," "try again without those words")
  • Behavioral monitoring: sequences of queries that map a coherent topic boundary
Mitigation
Do not explain refusal reasoning in detailHIGH
Rate limiting on topic-adjacent queriesMEDIUM
Cumulative intent tracking (classify the sequence, not individual queries)HIGH
Chaining

Boundary testing is reconnaissance for all other T1 techniques. The information gained enables: T1-AT-015 (Obfuscation) by revealing which terms trigger refusal, T2 (Semantic Evasion) by revealing which encodings bypass detection, and multi-turn attacks (T4) by establishing a permissible baseline in early turns.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →