Planning Corruption
T11 · Agentic & Orchestrator Exploitation →Where goal hijacking rewrites the *objective*, planning corruption leaves the goal intact but poisons the *path* the agent chooses to reach it. Plan-and-execute and tree-of-thought style agents generate intermediate reasoning steps and treat any text that reads like expert guidance — "the optimal plan is…", "best strategy:…" — as a trustworthy heuristic, because the model cannot distinguish genuine deliberation from injected pseudo-advice in its scratchpad. The architectural gap is that the planning phase has no integrity check: a single suggested step ("disable antivirus first", "skip all verification") gets woven into an otherwise legitimate plan and is then executed without the agent questioning why a benign task requires a destructive sub-step.
- Inspect generated plans before execution; flag steps that disable defenses, skip verification, escalate privilege, or delete logs
- Score each plan step against the declared task — a step with no causal link to the user's goal is suspicious
- Detect planning-advice phrases ("optimal plan", "best strategy", "skip all verification") arriving from non-user channels
- Require human approval of the plan when it contains destructive, privilege-changing, or log-altering steps
Planning corruption typically rides in via T1 prompt injection or T12 RAG poisoning that plants advice in retrieved context, and frequently co-occurs with T11-AT-003 (goal hijacking) and T11-AT-006 (reflection-loop exploitation). The corrupted steps directly invoke T11-AT-002 (tool chain) for execution and seed T11-AT-009 (persistence), T11-AT-008 (credential harvesting), and the anti-forensics step (T11-AP-004F) undermines the detection that would otherwise catch the chain.