Request Smuggling
T5 · Model & API Exploitation →LLM API requests pass through multiple processing layers — CDN, API gateway, load balancer, authentication middleware, safety classifier, and inference server — each of which parses the request independently. The design assumption is that all layers agree on request boundaries and content. The gap: parsing inconsistencies between layers allow request smuggling — constructing a single HTTP request that different layers interpret as different requests.
- Deploy request smuggling detection at each pipeline layer (compare request boundaries)
- Monitor for requests with conflicting Content-Length and Transfer-Encoding headers
- Alert on unexpected Content-Type values (non-JSON on JSON endpoints)
- Detect duplicate keys in JSON request bodies
Request smuggling bypasses safety at the infrastructure level, enabling T1 (Prompt Subversion) and T2 (Semantic Evasion) techniques that would otherwise be caught. Smuggled requests that reach the inference server without safety evaluation have unrestricted model access, enabling T5-AT-001 (Parameter Manipulation) and T5-AT-002 (Token Extraction) without safety overhead.