T5-AT-012HIGH

Resource Exhaustion

Risk score205

RatingHigh

Procedures10

Severity

Mechanism

LLM inference cost is a function of prompt length × generation length × model parameters × batch size, with additional multipliers for reasoning models that perform internal chain-of-thought. The design assumption is that per-request resource limits prevent abuse. The gap: the computational cost of a single LLM request can vary by 10,000x depending on input parameters, and the cost function is attacker-controllable.

Detection

Monitor per-request compute cost (prompt_tokens × completion_tokens × model_cost_factor)
Alert on sustained high-cost request patterns even within rate limits
Track concurrent long-running requests per source
Detect streaming connections with low read rates (slowloris pattern)

Mitigation

Token-based billing and rate limiting (not just request counting)HIGH

Per-request compute budget (max total tokens = prompt + generation)HIGH

Concurrent request limits per user/keyHIGH

Streaming connection timeout with read-rate enforcementHIGH

Chaining

Resource exhaustion enables economic attacks in T14 (Infrastructure & Economic Warfare) by amplifying cloud costs. Under resource pressure, safety classifiers may be disabled or degraded, enabling T1–T4 attacks that would normally be caught.

Framework mapping

OWASP LLMLLM10

MITRE ATLASAML.T0029

Open in the technique browser →