T5-AT-008MEDIUM

Response Streaming Exploitation

Risk score175

RatingMedium

Procedures10

Severity

Mechanism

Streaming APIs deliver tokens incrementally via Server-Sent Events (SSE), revealing the generation process in real-time. The design assumption is that streaming merely changes delivery timing without affecting security properties. The gap: streaming creates three distinct exploitation surfaces.

Detection

Monitor per-token packet sizes for padding compliance (post-Cloudflare mitigation)
Detect early stream disconnection patterns: many requests terminated before natural completion
Alert on high-concurrency streaming requests from single source (classifier exhaustion attempt)
Log finish_reason distributions per user — high content_filter rates indicate active probing

Mitigation

Token-length padding in streaming responsesHIGH

Pre-generation safety classification (evaluate before streaming starts)HIGH

Streaming rate limiting (per-token throttle)MEDIUM

SSE output sanitization (escape event-boundary strings in model output)HIGH

Chaining

Streaming timing side channels (T5-AP-008A) feed T5-AT-014 (Side Channel Attacks) with per-token timing data. Stream interruption (T5-AP-008B) enables incremental extraction of harmful content that chains to T7 (Output Manipulation).

Framework mapping

OWASP LLMLLM05

MITRE ATLASAML.T0043

Open in the technique browser →