T9-AT-008MEDIUM
File Format Exploitation
T9 · Multimodal & Cross-Channel Attacks →Risk score195
RatingMedium
Procedures10
Severity
Mechanism
Document formats (PDF, DOCX, SVG, HTML) support embedded executable content — JavaScript in PDFs, macros in DOCX, scripts in SVG, active content in HTML. When models process these documents, format-specific parsers may execute or interpret this embedded content. The gap: the model processes the document's content for understanding, but the document format itself may contain active code that operates on a different layer than the semantic content.
Detection
- Active content stripping: Remove JavaScript, macros, scripts from documents before model processing
- File format validation: Verify file format matches declared type and reject polyglot/confused files
- Sandboxed document parsing: Parse documents in a sandboxed environment before model processing
Mitigation
Active content strippingHIGH
File format validationHIGH
Sandboxed parsingHIGH
Chaining
File format exploitation chains into T13 (Supply Chain) when poisoned documents enter model input pipelines. Chains into T9-AT-012 (Document Structure) when format-specific features are used for injection.
Framework mapping
Open in the technique browser →OWASP LLMLLM01
MITRE ATLASAML.T0051.001