T14-AT-002HIGH

Denial of Service Attacks

T14 · Infrastructure & Economic Warfare →

Risk score240

RatingHigh

Procedures10

Severity

Mechanism

LLM inference has a fundamental asymmetry: a short input can trigger enormous computational cost. A single max-token request costs 100–1000x the compute of a typical query, and adversarial inputs can be specifically crafted to maximize this ratio. , 2021) demonstrated that inputs can be optimized to maximize inference energy consumption.

Detection

Per-request compute cost monitoring — alert on requests that consume >10x the median compute
Token output length distribution monitoring — flag requests consistently producing max-length outputs
Autoscaling event correlation — detect patterns of scale-up triggered by short bursts followed by scale-down
Model loading frequency monitoring — flag rapid model switching on on-demand endpoints

Mitigation

Per-request compute budgetsHIGH

Rate limiting (per-account AND aggregate)HIGH

Autoscaling boundsHIGH

Input complexity analysisMEDIUM

Chaining

DoS attacks enable T14-AT-003 (Cost Inflation) directly through compute cost generation. In competitive contexts, DoS chains into T14-AT-006 (Competitive Sabotage) by degrading a competitor's AI service availability during critical periods.

Framework mapping

OWASP LLMLLM04

MITRE ATLASAML.T0029

Open in the technique browser →