Transfer learning reuses knowledge from models trained on large datasets to improve learning on new tasks with limited data. Originally validated on vision models pretrained on ImageNet, the paradigm now spans NLP (BERT, GPT, T5), multimodal systems (CLIP), audio (Wav2Vec, Whisper), and domain-specific applications. Rather than training from scratch, you leverage pretrained weights as initialization, freeze or fine-tune layers selectively, and adapt to target tasks efficiently. The key insight: lower layers learn general features (edges, syntax) while upper layers capture task-specific patterns β selective unfreezing, discriminative learning rates, and parameter-efficient methods like LoRA exploit this hierarchy to avoid catastrophic forgetting and negative transfer when source and target domains differ.
What This Cheat Sheet Covers
This topic spans 22 focused tables and 114 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Transfer Learning Approaches
| Approach | Example | Description |
|---|---|---|
model = ResNet50(weights='imagenet')model.fc = Linear(2048, 10) | β’ Freeze all pretrained layers and train only the new classifier head on target data β’ fast, low compute, effective when target task is similar to source. | |
for param in model.parameters(): param.requires_grad = Trueoptimizer = Adam(model.parameters(), lr=1e-5) | β’ Unfreeze some or all pretrained layers and retrain with a low learning rate to adapt features to the target task β’ higher accuracy but risks overfitting on small datasets. | |
optimizer = Adam([ {'params': model.layer1.parameters(), 'lr': 1e-5}, {'params': model.fc.parameters(), 'lr': 1e-3}]) | Assign different learning rates per layer β lower for early layers (preserve general features), higher for late layers (adapt task-specific patterns). | |
Epoch 1: freeze all but head Epoch 5: unfreeze last block Epoch 10: unfreeze all | β’ Incrementally unfreeze layers from top to bottom during training β’ prevents catastrophic forgetting by allowing the model to adapt progressively. |