Self-supervised and contrastive learning represent a paradigm shift in machine learning where models learn powerful representations from unlabeled data by designing pretext tasks that generate supervisory signals from the data itself. These methods have revolutionized computer vision and NLP by enabling pretraining on vast unlabeled datasets, reducing dependence on expensive human annotations. The core principle is simple yet profound: by maximizing agreement between different augmented views of the same data while pushing apart views from different data, models learn semantically meaningful features that transfer remarkably well to downstream tasks. Understanding the delicate balance between preventing trivial collapse (where all representations become identical) and maintaining rich, discriminative embeddings is crucial for practitioners deploying these techniques at scale.