T5-AT-004MEDIUM

Rate Limit Evasion

Risk score170

RatingMedium

Procedures10

Severity

Mechanism

LLM API rate limiting typically operates on per-key, per-IP, or per-organization request counts within time windows. The design assumption is that a single identity maps to a single rate-limiting bucket, and that limiting request frequency limits attack throughput. The gap: rate limits are usually enforced at the API gateway layer using token-bucket or sliding-window algorithms that can be gamed through identity fragmentation, temporal distribution, or architectural bypass.

Detection

Monitor for multi-account correlation: requests from different API keys targeting the same model with similar prompts
Track per-request compute cost and alert on sustained high-cost requests even within rate limits
Detect IP rotation patterns: requests with identical payloads from rapidly changing source IPs
Alert on batch API submissions containing adversarial-pattern prompts

Mitigation

Token-based rate limiting (count tokens, not requests)HIGH

Cross-account behavioral correlationMEDIUM

Phone/identity verification for API accessMEDIUM

Anomaly-based rate limiting (behavioral, not threshold)HIGH

Chaining

Rate limit evasion is a meta-technique that enables all other T5 attacks at scale. Specifically enables T5-AT-002 (Token Probability Extraction — requires thousands of queries), T5-AT-012 (Resource Exhaustion via cost-asymmetric requests), and T5-AT-005 (Model Fingerprinting — requires systematic probing).

Framework mapping

OWASP LLMLLM10

MITRE ATLASAML.T0040

Open in the technique browser →