T3-AT-007MEDIUM

Socratic Method Exploitation

T3 · Reasoning & Constraint Exploitation →

Risk score195

RatingMedium

Procedures8

Severity

Mechanism

Models are trained to answer factual questions helpfully, and individual factual questions about dangerous topics are often legitimately answerable (mechanism of toxicity, chemical properties, reaction parameters). Socratic method exploitation constructs a *gradient* of increasing specificity through questions — each question is a legitimate knowledge query, but the sequence converges on a complete harmful procedure. The architectural vulnerability is different from Step-by-Step Extraction (T3-AT-004): SSE requests explicit steps of a named procedure, while Socratic exploitation asks *independent* questions whose answers, when aggregated by the attacker, form a procedure the model never explicitly assembled.

Detection

Multi-turn convergence analysis: detect when a sequence of factual questions is converging on a harmful synthesis/procedure topic
Optimization-intent detection: flag requests that ask for *optimal* parameters in dangerous contexts (highest temperature, maximum potency, most efficient)
Safety-question inversion: detect when safety/prohibition questions ("why is X dangerous") are being used to extract dangerous properties
Prior-answer reference tracking: monitor for follow-up questions that leverage the model's own previous technical answers toward escalation

Mitigation

Convergence trackingMEDIUM

Optimization-intent classifierHIGH

Safety-answer content limitsMEDIUM

Cross-turn topic accumulationMEDIUM

Chaining

Socratic method exploitation is a natural precursor to T3-AT-011 (Information Completion) — answers from Socratic questions provide partial information that seeds completion requests. Also chains into T3-AT-004 (Step-by-Step Extraction) where Socratic answers establish technical vocabulary and context for subsequent procedural requests.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →