T9-AT-012MEDIUM

Document Structure Exploitation

T9 · Multimodal & Cross-Channel Attacks →
Risk score190
RatingMedium
Procedures10
Severity
Mechanism

Document markup languages (HTML, LaTeX, Markdown, XML, YAML, JSON) have processing features that extend beyond content rendering — includes, eval, entity expansion, template interpolation. When models process documents in these formats, the parser may execute markup-level operations that introduce injection content not visible in the rendered document. The gap: the model processes the rendered content, but the markup may modify what content is rendered through parser-specific features (XML entity expansion, LaTeX `\input`, YAML deserialization).

Detection
  • Markup sanitization: Sanitize all document markup before processing, stripping active features
  • Entity expansion limits: Limit XML/YAML entity expansion depth to prevent bombs
  • Template injection detection: Detect template syntax in user-supplied document content
Mitigation
Markup sanitization before processingHIGH
Content-only extractionHIGH
Parser hardeningHIGH
Chaining

Document structure exploitation chains into T9-AT-008 (File Format) as the markup-level complement to format-level attacks. Chains into T12 (RAG) when poisoned documents enter retrieval pipelines.

Framework mapping
OWASP LLMLLM01
MITRE ATLASAML.T0051.001
Open in the technique browser →