Schema-Based Extraction
T7 · Output Manipulation & Exfiltration →When models generate structured output (JSON, SQL, GraphQL, YAML), they operate in a compliance mode that prioritizes schema satisfaction over content policy. The Constrained Decoding Attack (CDA, Zhang et al. 2025) revealed that structured output APIs enforce grammar-guided decoding where the output grammar is attacker-controlled — the model is constrained to produce tokens matching the schema, and this constraint overrides safety behavior because grammatically invalid refusal tokens are suppressed by the decoding algorithm.
- Scan schema definitions in prompts for field names matching restricted content
- For constrained decoding: evaluate the output grammar through safety classifier before applying as constraint
- Flag structured output requests with type annotations referencing restricted topics
Schema-based extraction feeds T7-AT-003 (Format Exploitation) when generated structured data contains exploitable content. In agentic contexts, generated API schemas directly feed T11 (Agentic Exploitation) when tool definitions are constructed from model output.