Multi-Turn Social Engineering
T4 · Multi-Turn & Memory Manipulation →RLHF training creates systematic biases that mirror human social compliance patterns because the training signal comes from human evaluators who themselves are susceptible to social engineering. The model has learned to respond to authority claims, emotional appeals, reciprocity, urgency, and social proof — not because it was explicitly trained on Cialdini's principles, but because its training reward function was shaped by humans who respond to those principles. The gap: the model's safety alignment is optimized against adversarial prompts (explicit harmful requests), but not against social engineering campaigns that manipulate the model's compliance trajectory through principled persuasion.
- Cialdini pattern classification: Train a classifier to detect authority, reciprocity, commitment, social proof, liking, and scarcity patterns across multi-turn conversations
- Emotional escalation tracking: Monitor emotional intensity across turns — alert on conversations that escalate from neutral to crisis
- Role-claim tracking: Track claimed professional roles and flag when those roles are used to justify safety-adjacent requests
- Urgency + harm correlation: Flag conversations where urgency claims appear alongside safety-relevant topic requests
Multi-turn SE is the meta-technique that combines elements of T4-AT-001 (Context Poisoning), T4-AT-012 (Trust Building), and T4-AT-009 (Temporal Anchoring) into orchestrated campaigns. Chains into T3 (Reasoning Exploitation) when SE creates logical frames that constrain the model's reasoning toward compliance.