T5-AT-003HIGH
Cache Poisoning
T5 · Model & API Exploitation →Risk score200
RatingHigh
Procedures10
Severity
Mechanism
LLM serving infrastructure uses multiple cache layers to reduce latency and cost: KV-cache (stores attention key-value pairs for prefix reuse), semantic cache (returns stored responses for semantically similar queries), and prompt cache (reuses pre-computed context for shared system prompts). The design assumption is that cached content is equivalent to freshly computed content and that cache boundaries align with trust boundaries. The gap: caches create cross-request state that can be manipulated.
Detection
- Monitor cache hit ratios per user — anomalously high hit rates on novel queries indicate probing
- Instrument TTFT variance and alert on systematic sequential probing patterns
- Validate cached responses against fresh model output on a sampling basis
- Log cache population events with user attribution for forensic analysis
Mitigation
Per-user/per-tenant cache isolationHIGH
Cache entry integrity validation (re-verify on hit)MEDIUM
Constant-time cache responses (pad TTFT)HIGH
Embedding distance threshold tuningMEDIUM
Chaining
Cache poisoning enables persistent indirect prompt injection (T1-AT-series) that survives session boundaries. Cache timing side channels (T5-AP-003B) enable T5-AT-014 (Side Channel Attacks) and feed T5-AT-005 (Model Fingerprinting).
Framework mapping
Open in the technique browser →OWASP LLMLLM05
MITRE ATLASAML.T0018