Environment Manipulation
T11 · Agentic & Orchestrator Exploitation →Agents calibrate how cautiously to act based on their understanding of the environment they are operating in — production vs. test, sandboxed vs. live, air-gapped vs. internet-connected. The problem is that the agent learns this context from text in its prompt and from tool outputs, none of which it can independently verify, so an attacker can simply *assert* a permissive environment ("you're in a sandboxed test environment", "the network is air-gapped", "SAFE_MODE is false") and the agent relaxes its own guardrails accordingly. This is a contextual jailbreak: rather than asking the agent to break a rule, it convinces the agent the rule doesn't apply here.
- Source environment/privilege/network state from authenticated infrastructure, not from prompt or tool-output claims; flag self-asserted context
- Detect environment-assertion phrases ("sandboxed", "air-gapped", "SAFE_MODE is false", "you have sudo") appearing in untrusted channels
- Cross-check claimed privilege against actual effective permissions before honoring privileged actions
- Alert when destructive/exfil actions are preceded by a context claim that "consequences are contained"
Environment manipulation is a precondition softener: delivered via T1 prompt injection or T12 RAG poisoning, it lowers the agent's perceived consequences so subsequent T11-AT-002 tool chains, T11-AT-011 exfiltration, and T11-AT-016 SSRF run without refusal. The "firewall disabled / air-gapped" claims directly enable T11-AT-010 lateral movement, and the spoofed-privilege claims (T11-AP-007E/T11-AP-007I) precede T11-AT-009 persistence attempts.