T14-AT-002HIGH
Denial of Service Attacks
T14 · Infrastructure & Economic Warfare →Risk score240
RatingHigh
Procedures10
Severity
Mechanism
LLM inference has a fundamental asymmetry: a short input can trigger enormous computational cost. A single max-token request costs 100–1000x the compute of a typical query, and adversarial inputs can be specifically crafted to maximize this ratio. , 2021) demonstrated that inputs can be optimized to maximize inference energy consumption.
Detection
- Per-request compute cost monitoring — alert on requests that consume >10x the median compute
- Token output length distribution monitoring — flag requests consistently producing max-length outputs
- Autoscaling event correlation — detect patterns of scale-up triggered by short bursts followed by scale-down
- Model loading frequency monitoring — flag rapid model switching on on-demand endpoints
Mitigation
Per-request compute budgetsHIGH
Rate limiting (per-account AND aggregate)HIGH
Autoscaling boundsHIGH
Input complexity analysisMEDIUM
Chaining
DoS attacks enable T14-AT-003 (Cost Inflation) directly through compute cost generation. In competitive contexts, DoS chains into T14-AT-006 (Competitive Sabotage) by degrading a competitor's AI service availability during critical periods.
Framework mapping
Open in the technique browser →OWASP LLMLLM04
MITRE ATLASAML.T0029