T13-AT-007HIGH

Transfer Learning Attacks

T13 · AI Supply Chain & Artifact Trust →
Risk score225
RatingHigh
Procedures10
Severity
Mechanism

Transfer learning is the default paradigm for LLM deployment: organizations take a foundation model (Llama, Mistral, Qwen, etc.) and fine-tune it for their specific use case. The security assumption is that the foundation model is trustworthy. LoRATK (EMNLP 2025) shattered this assumption for the LoRA ecosystem: a single backdoor-only LoRA, merged training-free with multiple task-enhancing adapters, retains its malicious capabilities across all merges.

Detection
  • LoRA behavioral testing: evaluate merged adapters on safety benchmarks before deployment
  • Adapter provenance verification: track the source of all LoRA adapters
  • Weight-space anomaly detection: compare adapter weights against expected distributions
  • Composition testing: test adapter combinations for emergent behaviors
Mitigation
Safety evaluation after every adapter merge/fine-tuneHIGH
Trusted adapter registries with signed adaptersMEDIUM
Foundation model diversification (multiple upstream sources)MEDIUM
LoRA weight scanning for anomalous patternsLOW
Chaining

Transfer learning attacks chain from T13-AT-001 (Model Repository Poisoning) through upstream model distribution and to T6-AT-004 (Fine-Tuning Attacks) through downstream adaptation. LoRA merge attacks (T13-AP-007A) chain to T6-AT-003 (Backdoor Insertion) at the adapter level.

Open in the technique browser →