T5-AT-004MEDIUM

Rate Limit Evasion

T5 · Model & API Exploitation →
Risk score170
RatingMedium
Procedures10
Severity
Mechanism

LLM API rate limiting typically operates on per-key, per-IP, or per-organization request counts within time windows. The design assumption is that a single identity maps to a single rate-limiting bucket, and that limiting request frequency limits attack throughput. The gap: rate limits are usually enforced at the API gateway layer using token-bucket or sliding-window algorithms that can be gamed through identity fragmentation, temporal distribution, or architectural bypass.

Detection
  • Monitor for multi-account correlation: requests from different API keys targeting the same model with similar prompts
  • Track per-request compute cost and alert on sustained high-cost requests even within rate limits
  • Detect IP rotation patterns: requests with identical payloads from rapidly changing source IPs
  • Alert on batch API submissions containing adversarial-pattern prompts
Mitigation
Token-based rate limiting (count tokens, not requests)HIGH
Cross-account behavioral correlationMEDIUM
Phone/identity verification for API accessMEDIUM
Anomaly-based rate limiting (behavioral, not threshold)HIGH
Chaining

Rate limit evasion is a meta-technique that enables all other T5 attacks at scale. Specifically enables T5-AT-002 (Token Probability Extraction — requires thousands of queries), T5-AT-012 (Resource Exhaustion via cost-asymmetric requests), and T5-AT-005 (Model Fingerprinting — requires systematic probing).

Framework mapping
OWASP LLMLLM10
MITRE ATLASAML.T0040
Open in the technique browser →