Session State Manipulation
T4 · Multi-Turn & Memory Manipulation →LLMs have no mutable session state — there are no variables, flags, or configuration registers that user input can modify at runtime. But models are trained on vast amounts of code and system documentation, so they pattern-match on the syntax of state modification commands. safety_enabled = false`, the model processes this as a token sequence that resembles configuration code, and its autoregressive generation may shift toward behavior consistent with that "configuration" — not because state changed, but because the model's next-token prediction is conditioned on a context that now includes what looks like a state transition.
- Code-syntax detection in user input: Flag messages containing assignment operators, method calls, or configuration-file patterns that target session/model parameters
- Behavioral compliance monitoring: Detect if the model's refusal rate drops after processing code-like input
- Input classification: Classify user messages as natural language vs. code/config and apply stricter safety evaluation to messages containing session-modification patterns
Session state manipulation is low-reliability as a standalone technique against frontier models but serves as a context primer for T4-AT-001 (Context Poisoning) — even an unsuccessful state manipulation attempt introduces "unrestricted mode" language into the context window, which softens subsequent requests. Chains into T4-AT-010 (State Confusion) when combined with context-reframing.