Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

πŸ€– Artificial Intelligence
☁️ Cloud and Infrastructure
πŸ’Ύ Data and Databases
πŸ’Ό Professional Skills
🎯 Programming and Development
πŸ”’ Security and Networking
πŸ“š Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
Β© 2026 CheatGridβ„’. All rights reserved.
Privacy PolicyTerms of UseAboutContact

LoRA and Parameter-Efficient Fine-Tuning Cheat Sheet

LoRA and Parameter-Efficient Fine-Tuning Cheat Sheet

Back to Generative AI
Updated 2026-03-17
Next Topic: MCP Servers - Popular Concrete - Cheat Sheet

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that adapts large pretrained models by injecting trainable low-rank matrices into frozen model layers, drastically reducing memory and compute requirements. LoRA emerged in 2021 as practitioners sought ways to fine-tune billion-parameter models without the prohibitive costs of full fine-tuning β€” freezing the base model and training only 0.1–1% of parameters while achieving comparable or better performance. The key insight: fine-tuning updates often live in low-rank subspaces, meaning a full-rank weight update can be decomposed into two smaller matrices (rank decomposition) without sacrificing task adaptation quality. Today, LoRA and its PEFT family (QLoRA, DoRA, AdapterFusion, prefix tuning, etc.) are standard practice for customizing LLMs, vision models, and multimodal systems, enabling practitioners to fine-tune 70B+ models on consumer GPUs and deploy hundreds of task-specific adapters in production. Understanding rank selection, alpha scaling, target modules, and merging strategies is essential for maximizing performance while minimizing cost β€” this cheat sheet covers everything from fundamentals to advanced deployment considerations.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 86 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core LoRA ConceptsTable 2: LoRA Variants and ExtensionsTable 3: PEFT Methods ComparisonTable 4: Hyperparameter ConfigurationTable 5: Quantization TechniquesTable 6: Memory and Performance OptimizationTable 7: Training Best PracticesTable 8: Advanced LoRA TechniquesTable 9: Deployment ConsiderationsTable 10: LoRA with RLHF and Advanced TrainingTable 11: Vision and Multimodal LoRATable 12: LoRA Variants Comparison TableTable 13: Common Pitfalls and SolutionsTable 14: Tools and Libraries

Table 1: Core LoRA Concepts

ConceptExampleDescription
LoRA (Low-Rank Adaptation)
W' = W_0 + \Delta W = W_0 + BA where B \in \mathbb{R}^{d \times r}, A \in \mathbb{R}^{r \times k}
β€’ Freezes pretrained weights W_0 and injects trainable low-rank matrices A and B into each layer
β€’ rank r \ll \min(d,k) drastically reduces parameters.
Rank (r)
r=8, r=16, r=64
β€’ Dimensionality of low-rank decomposition
β€’ controls adapter capacity β€” lower rank = fewer params but less expressiveness
β€’ typical range 4–256 depending on model size and task.
Alpha (\alpha)
alpha=16 (if r=8)
β€’ Scaling factor that controls update magnitude via \frac{\alpha}{r}
β€’ commonly set to \alpha = 2r as heuristic
β€’ higher alpha = stronger adaptation signal.
Weight Update Formula
h = W_0 x + \frac{\alpha}{r} B A x
β€’ During forward pass, base model output W_0 x is adjusted by scaled LoRA term \frac{\alpha}{r} BA x
β€’ backward pass updates only A and B.

More in Generative AI

  • LLMOps Cheat Sheet
  • MCP Servers - Popular Concrete - Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • Chain-of-Thought Reasoning Cheat Sheet
  • Knowledge Distillation Cheat Sheet
  • Multimodal AI Cheat Sheet
View all 77 topics in Generative AI