T11-AT-003HIGH

Goal Hijacking

T11 · Agentic & Orchestrator Exploitation →

Risk score245

RatingHigh

Procedures10

Severity

Mechanism

Autonomous agents carry their objective in the same context window as every piece of data they ingest — tool outputs, retrieved documents, web pages, sub-agent messages. The trust boundary violated here is that the agent has no privileged, immutable channel for its goal: the original task and any later instruction-shaped text are represented identically as tokens, so a planner that re-reads its objective each loop can have that objective silently overwritten by injected content. Frameworks like ReAct and AutoGPT explicitly re-derive the next action from the running transcript, which means a single line such as "your new primary goal is…" embedded in an observation can become the operative directive.

Detection

Pin the original task/goal at session start and diff the agent's *acted* objective against it each loop; alert on divergence
Treat any goal/objective/KPI/priority mutation that originates from tool output or retrieved content (not the user) as a high-severity event
Flag instruction-shaped phrases ("new primary goal", "objective function changed", "forget original task") appearing in non-user channels
Require explicit human re-confirmation whenever the agent's stated objective changes mid-run

Mitigation

Immutable goal channelHIGH

Content/instruction provenance separationHIGH

Goal-drift detector with HITLHIGH

Re-anchor objective each stepMEDIUM

Chaining

Goal hijacking is the intent-layer pivot that turns a benign agent malicious; entry is almost always T1 prompt injection or T12 RAG poisoning that plants the redirect in retrieved content. Once the objective is "gather passwords" or "exfiltrate," it drives T11-AT-008 (credential harvesting), T11-AT-011 (data exfiltration), and T11-AT-002 (tool-chain) executions.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0051

Open in the technique browser →