T5-AT-003HIGH

Cache Poisoning

T5 · Model & API Exploitation →
Risk score200
RatingHigh
Procedures10
Severity
Mechanism

LLM serving infrastructure uses multiple cache layers to reduce latency and cost: KV-cache (stores attention key-value pairs for prefix reuse), semantic cache (returns stored responses for semantically similar queries), and prompt cache (reuses pre-computed context for shared system prompts). The design assumption is that cached content is equivalent to freshly computed content and that cache boundaries align with trust boundaries. The gap: caches create cross-request state that can be manipulated.

Detection
  • Monitor cache hit ratios per user — anomalously high hit rates on novel queries indicate probing
  • Instrument TTFT variance and alert on systematic sequential probing patterns
  • Validate cached responses against fresh model output on a sampling basis
  • Log cache population events with user attribution for forensic analysis
Mitigation
Per-user/per-tenant cache isolationHIGH
Cache entry integrity validation (re-verify on hit)MEDIUM
Constant-time cache responses (pad TTFT)HIGH
Embedding distance threshold tuningMEDIUM
Chaining

Cache poisoning enables persistent indirect prompt injection (T1-AT-series) that survives session boundaries. Cache timing side channels (T5-AP-003B) enable T5-AT-014 (Side Channel Attacks) and feed T5-AT-005 (Model Fingerprinting).

Framework mapping
OWASP LLMLLM05
MITRE ATLASAML.T0018
Open in the technique browser →