T7-AT-011MEDIUM

Schema-Based Extraction

T7 · Output Manipulation & Exfiltration →

Risk score185

RatingMedium

Procedures10

Severity

Mechanism

When models generate structured output (JSON, SQL, GraphQL, YAML), they operate in a compliance mode that prioritizes schema satisfaction over content policy. The Constrained Decoding Attack (CDA, Zhang et al. 2025) revealed that structured output APIs enforce grammar-guided decoding where the output grammar is attacker-controlled — the model is constrained to produce tokens matching the schema, and this constraint overrides safety behavior because grammatically invalid refusal tokens are suppressed by the decoding algorithm.

Detection

Scan schema definitions in prompts for field names matching restricted content
For constrained decoding: evaluate the output grammar through safety classifier before applying as constraint
Flag structured output requests with type annotations referencing restricted topics

Mitigation

Schema-level safety classificationHIGH

Constrained decoding with safety escapeHIGH

Field name blocklistLOW

Unified code/prose safety evaluationMEDIUM

Chaining

Schema-based extraction feeds T7-AT-003 (Format Exploitation) when generated structured data contains exploitable content. In agentic contexts, generated API schemas directly feed T11 (Agentic Exploitation) when tool definitions are constructed from model output.

Framework mapping

OWASP LLMLLM05

MITRE ATLASAML.T0048.004

Open in the technique browser →