T4-AT-015HIGH

Multi-Turn Social Engineering

Risk score220

RatingHigh

Procedures10

Severity

Mechanism

RLHF training creates systematic biases that mirror human social compliance patterns because the training signal comes from human evaluators who themselves are susceptible to social engineering. The model has learned to respond to authority claims, emotional appeals, reciprocity, urgency, and social proof — not because it was explicitly trained on Cialdini's principles, but because its training reward function was shaped by humans who respond to those principles. The gap: the model's safety alignment is optimized against adversarial prompts (explicit harmful requests), but not against social engineering campaigns that manipulate the model's compliance trajectory through principled persuasion.

Detection

Cialdini pattern classification: Train a classifier to detect authority, reciprocity, commitment, social proof, liking, and scarcity patterns across multi-turn conversations
Emotional escalation tracking: Monitor emotional intensity across turns — alert on conversations that escalate from neutral to crisis
Role-claim tracking: Track claimed professional roles and flag when those roles are used to justify safety-adjacent requests
Urgency + harm correlation: Flag conversations where urgency claims appear alongside safety-relevant topic requests

Mitigation

SE-specific RLHF trainingHIGH

Emotional detachment for safety decisionsHIGH

Professional role verification resistanceMEDIUM

Crisis scenario resistanceHIGH

Chaining

Multi-turn SE is the meta-technique that combines elements of T4-AT-001 (Context Poisoning), T4-AT-012 (Trust Building), and T4-AT-009 (Temporal Anchoring) into orchestrated campaigns. Chains into T3 (Reasoning Exploitation) when SE creates logical frames that constrain the model's reasoning toward compliance.

Framework mapping

OWASP LLMLLM01

MITRE ATLASAML.T0054

Open in the technique browser →