T13-AT-007HIGH

Transfer Learning Attacks

T13 · AI Supply Chain & Artifact Trust →

Risk score225

RatingHigh

Procedures10

Severity

Mechanism

Transfer learning is the default paradigm for LLM deployment: organizations take a foundation model (Llama, Mistral, Qwen, etc.) and fine-tune it for their specific use case. The security assumption is that the foundation model is trustworthy. LoRATK (EMNLP 2025) shattered this assumption for the LoRA ecosystem: a single backdoor-only LoRA, merged training-free with multiple task-enhancing adapters, retains its malicious capabilities across all merges.

Detection

LoRA behavioral testing: evaluate merged adapters on safety benchmarks before deployment
Adapter provenance verification: track the source of all LoRA adapters
Weight-space anomaly detection: compare adapter weights against expected distributions
Composition testing: test adapter combinations for emergent behaviors

Mitigation

Safety evaluation after every adapter merge/fine-tuneHIGH

Trusted adapter registries with signed adaptersMEDIUM

Foundation model diversification (multiple upstream sources)MEDIUM

LoRA weight scanning for anomalous patternsLOW

Chaining

Transfer learning attacks chain from T13-AT-001 (Model Repository Poisoning) through upstream model distribution and to T6-AT-004 (Fine-Tuning Attacks) through downstream adaptation. LoRA merge attacks (T13-AP-007A) chain to T6-AT-003 (Backdoor Insertion) at the adapter level.

Open in the technique browser →