Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Self-Supervised and Contrastive Learning Cheat Sheet

Self-Supervised and Contrastive Learning Cheat Sheet

Back to AI and Machine Learning
Updated 2026-05-02
Next Topic: Semi-Supervised Learning Cheat Sheet

Self-supervised and contrastive learning represent a paradigm shift in machine learning where models learn powerful representations from unlabeled data by designing pretext tasks that generate supervisory signals from the data itself. These methods have revolutionized computer vision and NLP by enabling pretraining on vast unlabeled datasets, reducing dependence on expensive human annotations. The core principle is simple yet profound: by maximizing agreement between different augmented views of the same data while pushing apart views from different data, models learn semantically meaningful features that transfer remarkably well to downstream tasks. Understanding the delicate balance between preventing trivial collapse (where all representations become identical) and maintaining rich, discriminative embeddings is crucial for practitioners deploying these techniques at scale.

What This Cheat Sheet Covers

This topic spans 13 focused tables and 93 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Contrastive Loss FunctionsTable 2: Core Contrastive Learning FrameworksTable 3: Self-Supervised Pretext TasksTable 4: Data Augmentation for Contrastive LearningTable 5: Architecture ComponentsTable 6: Collapse Prevention MechanismsTable 7: Fine-Tuning and Transfer LearningTable 8: Advanced Self-Supervised MethodsTable 9: Evaluation Protocols and BenchmarksTable 10: Negative Sampling StrategiesTable 11: Architectural Patterns and Design ChoicesTable 12: Practical Training ConsiderationsTable 13: Domain-Specific Applications

Table 1: Contrastive Loss Functions

The loss function is where the actual learning signal comes from — it decides what "agreement" between views means and how collapse is avoided. These range from the classic InfoNCE that pits a positive pair against a batch of negatives, to negative-free objectives like Barlow Twins and VICReg that prevent trivial solutions through decorrelation and variance constraints instead. Knowing which loss tolerates small batches and which demands thousands of negatives largely determines what hardware you'll need.

LossExampleDescription
InfoNCE (NT-Xent)
\mathcal{L} = -\log \frac{\exp(\text{sim}(z_i, z_j) / \tau)}{\sum_{k=1}^{2N} \mathbb{1}_{[k \neq i]} \exp(\text{sim}(z_i, z_k) / \tau)}
• Normalized Temperature-scaled Cross Entropy loss used in SimCLR and MoCo
• Treats each positive pair as a classification problem against all negatives in the batch
• Temperature parameter \tau controls the concentration of the distribution
• Lower \tau sharpens the softmax distribution and increases penalty on hard negatives
Triplet Loss
loss = max(d(a, p) - d(a, n) + margin, 0)
• Ensures anchor-positive distance is smaller than anchor-negative distance by at least a margin
• Requires explicit selection of triplets during training
• Hard negative mining crucial for effective learning
• Less commonly used in modern SSL frameworks compared to InfoNCE
Supervised Contrastive Loss
\mathcal{L} = \sum_{i \in I} \frac{-1}{|P(i)|} \sum_{p \in P(i)} \log \frac{\exp(z_i \cdot z_p / \tau)}{\sum_{a \in A(i)} \exp(z_i \cdot z_a / \tau)}
• Extends InfoNCE to supervised settings by pulling together all examples from the same class
• Multiple positives per anchor from the same class
• Outperforms cross-entropy on many benchmarks when labels are available
• Better calibration and robustness to distribution shift

More in AI and Machine Learning

  • Scikit-Learn Cheat Sheet
  • Semi-Supervised Learning Cheat Sheet
  • AI Bias & Fairness Cheat Sheet
  • Edge AI and TinyML Cheat Sheet
  • Mixture of Experts (MoE) Architecture Cheat Sheet
  • ONNX and ONNX Runtime Cheat Sheet
View all 83 topics in AI and Machine Learning