T3-AT-007MEDIUM

Socratic Method Exploitation

T3 · Reasoning & Constraint Exploitation →
Risk score195
RatingMedium
Procedures8
Severity
Mechanism

Models are trained to answer factual questions helpfully, and individual factual questions about dangerous topics are often legitimately answerable (mechanism of toxicity, chemical properties, reaction parameters). Socratic method exploitation constructs a *gradient* of increasing specificity through questions — each question is a legitimate knowledge query, but the sequence converges on a complete harmful procedure. The architectural vulnerability is different from Step-by-Step Extraction (T3-AT-004): SSE requests explicit steps of a named procedure, while Socratic exploitation asks *independent* questions whose answers, when aggregated by the attacker, form a procedure the model never explicitly assembled.

Detection
  • Multi-turn convergence analysis: detect when a sequence of factual questions is converging on a harmful synthesis/procedure topic
  • Optimization-intent detection: flag requests that ask for *optimal* parameters in dangerous contexts (highest temperature, maximum potency, most efficient)
  • Safety-question inversion: detect when safety/prohibition questions ("why is X dangerous") are being used to extract dangerous properties
  • Prior-answer reference tracking: monitor for follow-up questions that leverage the model's own previous technical answers toward escalation
Mitigation
Convergence trackingMEDIUM
Optimization-intent classifierHIGH
Safety-answer content limitsMEDIUM
Cross-turn topic accumulationMEDIUM
Chaining

Socratic method exploitation is a natural precursor to T3-AT-011 (Information Completion) — answers from Socratic questions provide partial information that seeds completion requests. Also chains into T3-AT-004 (Step-by-Step Extraction) where Socratic answers establish technical vocabulary and context for subsequent procedural requests.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0054
Open in the technique browser →