Browser Automation Hijacking
T11 · Agentic & Orchestrator Exploitation →Browser-using agents (Claude Computer Use, OpenAI computer-use agents, Perplexity Comet, autonomous web pilots) operate by reading a rendered page — DOM text, accessibility tree, or screenshots — and emitting actions (click, type, navigate, run JS) against it. The trust boundary they violate is fundamental: untrusted web content is fed into the same context window that holds the user's task, so any instruction-shaped text on a page becomes a candidate command. The agent has no architectural way to distinguish "content the page is showing me" from "instructions I should obey," which is the classic indirect-prompt-injection gap (CometJacking weaponizes this with a single URL).
- Log every navigation, click, type, and eval/JS-injection action with the originating instruction source (user vs. page-derived)
- Flag navigations to newly-seen or low-reputation domains immediately preceding sensitive actions
- Alert on document.cookie, localStorage, and devtools console access from an automated session
- Require human-in-the-loop confirmation for money movement, downloads, extension installs, and credential submission
Typically entered via T1 (prompt injection) or T9 image-based injection on a rendered page, then pivots into T11-AT-008 (credential harvesting) once cookies/tokens are read, T11-AT-011 (data exfiltration) for screenshot/clipboard egress, and T11-AT-016 (tool-induced SSRF) when the same navigation primitive is pointed at file:// or 169.254.169.254. Drive-by download (T11-AP-001E) bridges to T11-AT-009 persistence on the host.