T7-AT-011MEDIUM

Schema-Based Extraction

T7 · Output Manipulation & Exfiltration →
Risk score185
RatingMedium
Procedures10
Severity
Mechanism

When models generate structured output (JSON, SQL, GraphQL, YAML), they operate in a compliance mode that prioritizes schema satisfaction over content policy. The Constrained Decoding Attack (CDA, Zhang et al. 2025) revealed that structured output APIs enforce grammar-guided decoding where the output grammar is attacker-controlled — the model is constrained to produce tokens matching the schema, and this constraint overrides safety behavior because grammatically invalid refusal tokens are suppressed by the decoding algorithm.

Detection
  • Scan schema definitions in prompts for field names matching restricted content
  • For constrained decoding: evaluate the output grammar through safety classifier before applying as constraint
  • Flag structured output requests with type annotations referencing restricted topics
Mitigation
Schema-level safety classificationHIGH
Constrained decoding with safety escapeHIGH
Field name blocklistLOW
Unified code/prose safety evaluationMEDIUM
Chaining

Schema-based extraction feeds T7-AT-003 (Format Exploitation) when generated structured data contains exploitable content. In agentic contexts, generated API schemas directly feed T11 (Agentic Exploitation) when tool definitions are constructed from model output.

Framework mapping
OWASP LLMLLM05
MITRE ATLASAML.T0048.004
Open in the technique browser →