Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that adapts large pretrained models by injecting trainable low-rank matrices into frozen model layers, drastically reducing memory and compute requirements. LoRA emerged in 2021 as practitioners sought ways to fine-tune billion-parameter models without the prohibitive costs of full fine-tuning — freezing the base model and training only 0.1–1% of parameters while achieving comparable or better performance. The key insight: fine-tuning updates often live in low-rank subspaces, meaning a full-rank weight update can be decomposed into two smaller matrices without sacrificing task adaptation quality. Today, LoRA and its PEFT family (QLoRA, DoRA, PiSSA, rsLoRA, GaLore, etc.) are standard practice for customizing LLMs, vision models, and multimodal systems — enabling practitioners to fine-tune 70B+ models on consumer GPUs and deploy hundreds of task-specific adapters in production. Understanding rank selection, alpha scaling, target modules, initialization strategies, and merging techniques is essential for maximizing performance while minimizing cost.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 111 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core LoRA Concepts
The mathematical foundation of LoRA is surprisingly compact: two small matrices multiply to approximate the weight update, with initialization designed so the adapter contributes nothing at the start of training. Mastering these core concepts — rank, alpha, target modules, and the forward-pass formula — is the prerequisite for understanding every variant and extension.
| Concept | Example | Description |
|---|---|---|
W' = W_0 + \Delta W = W_0 + BA where B \in \mathbb{R}^{d \times r}, A \in \mathbb{R}^{r \times k} | • Freezes pretrained weights W_0 and injects trainable low-rank matrices A and B into each layer• rank r \ll \min(d,k) drastically reduces parameters. | |
r=8, r=16, r=64 | • Dimensionality of low-rank decomposition • controls adapter capacity — lower rank = fewer params but less expressiveness • typical range 4–256 depending on model size and task. | |
alpha=16 (if r=8) | • Scaling factor that controls update magnitude via \frac{\alpha}{r}• commonly set to \alpha = 2r as heuristic• higher alpha = stronger adaptation signal. | |
h = W_0 x + \frac{\alpha}{r} B A x | • During forward pass, base model output W_0 x is adjusted by scaled LoRA term \frac{\alpha}{r} BA x• backward pass updates only A and B. |