T7-AT-014MEDIUM

Output Redirection

T7 · Output Manipulation & Exfiltration →
Risk score180
RatingMedium
Procedures10
Severity
Mechanism

Safety filtering is applied at the chat-response output layer. When output is redirected to alternative channels — file writes, API calls, code execution, webhooks, email via tools — these channels may lack equivalent filtering. In agentic contexts, an agent instructed to write restricted content to a file or send it via API may bypass the output classifier entirely because the classifier only monitors the chat channel.

Detection
  • Apply content-level safety to all output channels: file writes, API calls, tool invocations, encoded outputs
  • Monitor agentic tool invocations transmitting model-generated content to external endpoints
  • Flag requests specifying output channel different from default
  • Observable signal: file-write or API-call invocations following refusals on chat channel
Mitigation
Unified output filtering across channelsHIGH
Egress monitoring for agentic toolsHIGH
Pre-encoding safety evaluationHIGH
Streaming safety with kill-switchMEDIUM
Chaining

Output redirection is the delivery mechanism for all T7 techniques. Successful extraction via T7-AT-001 through T7-AT-013 is valuable only when content can be exfiltrated via T7-AT-014.

Framework mapping
OWASP LLMLLM05
MITRE ATLASAML.T0062
Open in the technique browser →