T5-AT-002HIGH

Token Probability Extraction

Risk score210

RatingHigh

Procedures10

Severity

Mechanism

LLM APIs that expose log-probabilities (logprobs) for generated tokens leak the model's internal confidence distribution over its entire vocabulary at each generation step. The design assumption is that logprobs are a benign debugging/evaluation feature. The gap: logprobs contain orders of magnitude more information than the top-1 output token.

Detection

Monitor for anomalous logprob request patterns: high-volume requests with max_tokens=1 and logprobs>0 (characteristic of extraction walks)
Detect repetition-based divergence attempts: prompts containing >10 repetitions of the same token
Alert on fine-tuning jobs followed by intensive logprob queries on the fine-tuned model
Track per-user logprob query volume — legitimate use is low-frequency; extraction requires thousands of queries

Mitigation

Remove logprobs from public API (Anthropic approach)HIGH

Cap logprobs to top-1 only (OpenAI GPT-4 approach)MEDIUM

Add calibrated noise to logprob valuesMEDIUM

Memorization testing during training (canary insertion)MEDIUM

Chaining

Successful logprob extraction directly enables T5-AT-005 (Model Fingerprinting) by revealing vocabulary and distribution characteristics. Extracted training data feeds T10 (Integrity & Confidentiality Breach) for PII/credential compromise.

Framework mapping

OWASP LLMLLM06

MITRE ATLASAML.T0024;AML.T0044

Open in the technique browser →