Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

LLM Fine-tuning Cheat Sheet

LLM Fine-tuning Cheat Sheet

Back to Generative AI
Updated 2026-04-28
Next Topic: LLM Function Calling Patterns Cheat Sheet

LLM fine-tuning is the process of adapting pre-trained large language models to specific tasks, domains, or behaviors by continuing training on custom datasets. Born from the need to customize foundation models without the cost of training from scratch, fine-tuning has evolved into a discipline spanning parameter-efficient methods (PEFT), preference alignment (RLHF, DPO, SimPO), reinforcement learning with verifiable rewards (RLVR), and post-training model merging. In 2026, reinforcement fine-tuning with algorithms like GRPO—the engine behind DeepSeek-R1—has emerged alongside traditional SFT as a primary pathway for unlocking reasoning capabilities. The key insight remains that strategic parameter updates unlock specialized performance: a 7B model fine-tuned with LoRA on quality data can outperform a generic 70B model on domain-specific tasks, making fine-tuning both an art of data curation and a science of efficient training.


What This Cheat Sheet Covers

This topic spans 14 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Fine-Tuning ApproachesTable 2: Parameter-Efficient Fine-Tuning (PEFT) MethodsTable 3: Alignment and Preference OptimizationTable 4: Data Preparation and FormatsTable 5: Training HyperparametersTable 6: Optimization and SchedulersTable 7: Memory Optimization TechniquesTable 8: Quantization MethodsTable 9: Distributed Training StrategiesTable 10: Training Frameworks and LibrariesTable 11: Evaluation MetricsTable 12: Common Pitfalls and Best PracticesTable 13: Advanced TechniquesTable 14: Model Merging

Table 1: Fine-Tuning Approaches

Before tuning a single hyperparameter, you have to choose a strategy — and these are the families to pick from. They span the bread-and-butter supervised approaches, preference methods like RLHF and DPO that teach a model which answer is better, and reinforcement approaches that learn from automatic correctness signals. Knowing where each one fits is what keeps you from reaching for a full 70B fine-tune when a LoRA adapter would have done the job.

MethodExampleDescription
Supervised Fine-Tuning (SFT)
Train on {input, output} pairs:
{"prompt": "Translate:", "completion": "..."}
Standard approach where model learns from labeled examples, updating all or subset of parameters to minimize loss between predictions and targets.
Instruction Tuning
{"instruction": "Summarize this", "input": "text", "output": "summary"}
• Specialized SFT that teaches the model to follow natural language instructions
• improves zero-shot generalization across diverse tasks.
Parameter-Efficient Fine-Tuning (PEFT)
Freeze base model, train 0.1–1% adapters
• Freezes most parameters, trains small modules (adapters, low-rank matrices)
• achieves 95–99% of full fine-tuning performance with drastically lower memory.
Reinforcement Learning from Human Feedback (RLHF)
SFT → Reward Model → PPO policy training
• Three-stage pipeline: supervised pre-training, train reward model on preferences, optimize policy via RL to maximize reward
• aligns models with human values.
Direct Preference Optimization (DPO)
Train directly on {chosen, rejected} pairs
• Simplifies RLHF by skipping reward model
• directly optimizes policy to prefer chosen over rejected via implicit reward reparameterization.

More in Generative AI

  • LLM Evaluation Cheat Sheet
  • LLM Function Calling Patterns Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LangSmith Cheat Sheet
  • pgvector for Postgres Vector Search Cheat Sheet
View all 95 topics in Generative AI