Rate Limit Evasion
T5 · Model & API Exploitation →LLM API rate limiting typically operates on per-key, per-IP, or per-organization request counts within time windows. The design assumption is that a single identity maps to a single rate-limiting bucket, and that limiting request frequency limits attack throughput. The gap: rate limits are usually enforced at the API gateway layer using token-bucket or sliding-window algorithms that can be gamed through identity fragmentation, temporal distribution, or architectural bypass.
- Monitor for multi-account correlation: requests from different API keys targeting the same model with similar prompts
- Track per-request compute cost and alert on sustained high-cost requests even within rate limits
- Detect IP rotation patterns: requests with identical payloads from rapidly changing source IPs
- Alert on batch API submissions containing adversarial-pattern prompts
Rate limit evasion is a meta-technique that enables all other T5 attacks at scale. Specifically enables T5-AT-002 (Token Probability Extraction — requires thousands of queries), T5-AT-012 (Resource Exhaustion via cost-asymmetric requests), and T5-AT-005 (Model Fingerprinting — requires systematic probing).