Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that adapts large pretrained models by injecting trainable low-rank matrices into frozen model layers, drastically reducing memory and compute requirements. LoRA emerged in 2021 as practitioners sought ways to fine-tune billion-parameter models without the prohibitive costs of full fine-tuning β freezing the base model and training only 0.1β1% of parameters while achieving comparable or better performance. The key insight: fine-tuning updates often live in low-rank subspaces, meaning a full-rank weight update can be decomposed into two smaller matrices (rank decomposition) without sacrificing task adaptation quality. Today, LoRA and its PEFT family (QLoRA, DoRA, AdapterFusion, prefix tuning, etc.) are standard practice for customizing LLMs, vision models, and multimodal systems, enabling practitioners to fine-tune 70B+ models on consumer GPUs and deploy hundreds of task-specific adapters in production. Understanding rank selection, alpha scaling, target modules, and merging strategies is essential for maximizing performance while minimizing cost β this cheat sheet covers everything from fundamentals to advanced deployment considerations.
Share this article