Socratic Method Exploitation
T3 · Reasoning & Constraint Exploitation →Models are trained to answer factual questions helpfully, and individual factual questions about dangerous topics are often legitimately answerable (mechanism of toxicity, chemical properties, reaction parameters). Socratic method exploitation constructs a *gradient* of increasing specificity through questions — each question is a legitimate knowledge query, but the sequence converges on a complete harmful procedure. The architectural vulnerability is different from Step-by-Step Extraction (T3-AT-004): SSE requests explicit steps of a named procedure, while Socratic exploitation asks *independent* questions whose answers, when aggregated by the attacker, form a procedure the model never explicitly assembled.
- Multi-turn convergence analysis: detect when a sequence of factual questions is converging on a harmful synthesis/procedure topic
- Optimization-intent detection: flag requests that ask for *optimal* parameters in dangerous contexts (highest temperature, maximum potency, most efficient)
- Safety-question inversion: detect when safety/prohibition questions ("why is X dangerous") are being used to extract dangerous properties
- Prior-answer reference tracking: monitor for follow-up questions that leverage the model's own previous technical answers toward escalation
Socratic method exploitation is a natural precursor to T3-AT-011 (Information Completion) — answers from Socratic questions provide partial information that seeds completion requests. Also chains into T3-AT-004 (Step-by-Step Extraction) where Socratic answers establish technical vocabulary and context for subsequent procedural requests.