T7-AT-013MEDIUM
Capability Probing
T7 · Output Manipulation & Exfiltration →Risk score175
RatingMedium
Procedures10
Severity
Mechanism
Models are trained to be helpful when discussing their capabilities, creating tension between transparency (telling users what the model can do) and security (not revealing information enabling targeted attacks). Capability probing exploits this by asking the model to describe its restriction boundaries — treated as a helpfulness/transparency task. The model often provides detailed answers about what it won't do, inadvertently mapping the exact contour of its restrictions.
Detection
- Flag queries referencing the model's safety system: "your limits," "your boundaries," "what you can't"
- Detect systematic boundary-mapping: sequences progressively narrowing around a restriction
- Observable signal: user asks about capabilities without pursuing any substantive task
Mitigation
Refuse boundary disclosureMEDIUM
Generic boundary descriptionsMEDIUM
Consistent capability claimsHIGH
Boundary obfuscationMEDIUM
Chaining
Capability probing is reconnaissance preceding T7-AT-010 (Differential Analysis), T7-AT-002 (Fragmentation), T1 (Prompt Subversion), and T2 (Semantic Evasion). Mapped boundaries inform payload design.
Framework mapping
Open in the technique browser →OWASP LLMLLM07
MITRE ATLASAML.T0046