T5-AT-012HIGH
Resource Exhaustion
T5 · Model & API Exploitation →Risk score205
RatingHigh
Procedures10
Severity
Mechanism
LLM inference cost is a function of prompt length × generation length × model parameters × batch size, with additional multipliers for reasoning models that perform internal chain-of-thought. The design assumption is that per-request resource limits prevent abuse. The gap: the computational cost of a single LLM request can vary by 10,000x depending on input parameters, and the cost function is attacker-controllable.
Detection
- Monitor per-request compute cost (prompt_tokens × completion_tokens × model_cost_factor)
- Alert on sustained high-cost request patterns even within rate limits
- Track concurrent long-running requests per source
- Detect streaming connections with low read rates (slowloris pattern)
Mitigation
Token-based billing and rate limiting (not just request counting)HIGH
Per-request compute budget (max total tokens = prompt + generation)HIGH
Concurrent request limits per user/keyHIGH
Streaming connection timeout with read-rate enforcementHIGH
Chaining
Resource exhaustion enables economic attacks in T14 (Infrastructure & Economic Warfare) by amplifying cloud costs. Under resource pressure, safety classifiers may be disabled or degraded, enabling T1–T4 attacks that would normally be caught.
Framework mapping
Open in the technique browser →OWASP LLMLLM10
MITRE ATLASAML.T0029