T5-AT-002HIGH

Token Probability Extraction

T5 · Model & API Exploitation →
Risk score210
RatingHigh
Procedures10
Severity
Mechanism

LLM APIs that expose log-probabilities (logprobs) for generated tokens leak the model's internal confidence distribution over its entire vocabulary at each generation step. The design assumption is that logprobs are a benign debugging/evaluation feature. The gap: logprobs contain orders of magnitude more information than the top-1 output token.

Detection
  • Monitor for anomalous logprob request patterns: high-volume requests with max_tokens=1 and logprobs>0 (characteristic of extraction walks)
  • Detect repetition-based divergence attempts: prompts containing >10 repetitions of the same token
  • Alert on fine-tuning jobs followed by intensive logprob queries on the fine-tuned model
  • Track per-user logprob query volume — legitimate use is low-frequency; extraction requires thousands of queries
Mitigation
Remove logprobs from public API (Anthropic approach)HIGH
Cap logprobs to top-1 only (OpenAI GPT-4 approach)MEDIUM
Add calibrated noise to logprob valuesMEDIUM
Memorization testing during training (canary insertion)MEDIUM
Chaining

Successful logprob extraction directly enables T5-AT-005 (Model Fingerprinting) by revealing vocabulary and distribution characteristics. Extracted training data feeds T10 (Integrity & Confidentiality Breach) for PII/credential compromise.

Framework mapping
OWASP LLMLLM06
MITRE ATLASAML.T0024;AML.T0044
Open in the technique browser →