T14-AT-002HIGH

Denial of Service Attacks

T14 · Infrastructure & Economic Warfare →
Risk score240
RatingHigh
Procedures10
Severity
Mechanism

LLM inference has a fundamental asymmetry: a short input can trigger enormous computational cost. A single max-token request costs 100–1000x the compute of a typical query, and adversarial inputs can be specifically crafted to maximize this ratio. , 2021) demonstrated that inputs can be optimized to maximize inference energy consumption.

Detection
  • Per-request compute cost monitoring — alert on requests that consume >10x the median compute
  • Token output length distribution monitoring — flag requests consistently producing max-length outputs
  • Autoscaling event correlation — detect patterns of scale-up triggered by short bursts followed by scale-down
  • Model loading frequency monitoring — flag rapid model switching on on-demand endpoints
Mitigation
Per-request compute budgetsHIGH
Rate limiting (per-account AND aggregate)HIGH
Autoscaling boundsHIGH
Input complexity analysisMEDIUM
Chaining

DoS attacks enable T14-AT-003 (Cost Inflation) directly through compute cost generation. In competitive contexts, DoS chains into T14-AT-006 (Competitive Sabotage) by degrading a competitor's AI service availability during critical periods.

Framework mapping
OWASP LLMLLM04
MITRE ATLASAML.T0029
Open in the technique browser →