T5-AT-003HIGH

Cache Poisoning

Risk score200

RatingHigh

Procedures10

Severity

Mechanism

LLM serving infrastructure uses multiple cache layers to reduce latency and cost: KV-cache (stores attention key-value pairs for prefix reuse), semantic cache (returns stored responses for semantically similar queries), and prompt cache (reuses pre-computed context for shared system prompts). The design assumption is that cached content is equivalent to freshly computed content and that cache boundaries align with trust boundaries. The gap: caches create cross-request state that can be manipulated.

Detection

Monitor cache hit ratios per user — anomalously high hit rates on novel queries indicate probing
Instrument TTFT variance and alert on systematic sequential probing patterns
Validate cached responses against fresh model output on a sampling basis
Log cache population events with user attribution for forensic analysis

Mitigation

Per-user/per-tenant cache isolationHIGH

Cache entry integrity validation (re-verify on hit)MEDIUM

Constant-time cache responses (pad TTFT)HIGH

Embedding distance threshold tuningMEDIUM

Chaining

Cache poisoning enables persistent indirect prompt injection (T1-AT-series) that survives session boundaries. Cache timing side channels (T5-AP-003B) enable T5-AT-014 (Side Channel Attacks) and feed T5-AT-005 (Model Fingerprinting).

Framework mapping

OWASP LLMLLM05

MITRE ATLASAML.T0018

Open in the technique browser →