Data augmentation is a powerful regularization technique that artificially expands training datasets by creating modified copies of existing data, helping deep learning models generalize better to unseen examples. Augmentation techniques transform input data during training without requiring additional labeling effort, effectively addressing data scarcity and overfitting. Understanding when to apply simple geometric transformations versus advanced policy-based methods, and how to balance augmentation strength with model capacity, is critical — too weak augmentation leaves overfitting unaddressed, while overly aggressive augmentation can degrade training signal and slow convergence.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 79 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Basic Geometric Transformations
The workhorses of image augmentation — flips, rotations, crops, and affine or perspective warps that move pixels around in space while leaving the label intact. They're cheap, intuitive, and almost always your first line of defence against overfitting, though each carries a caveat about when it stops being label-preserving (a flipped digit or rotated road sign can change meaning).
| Technique | Example | Description |
|---|---|---|
transforms.RandomHorizontalFlip(p=0.5) | • Mirrors image along horizontal or vertical axis with probability p• Preserves semantic content while introducing rotational invariance • Valid only when orientation doesn't affect label (not suitable for text or directional signs) | |
A.Rotate(limit=15, p=0.8) | • Rotates image by random angle within [-limit, +limit] degrees• Small angles (±15°) preserve context while introducing orientation invariance • Large rotations may create unrealistic perspectives or crop important regions | |
transforms.RandomCrop(size=224) | • Extracts random spatial region of specified size from image • Forces model to learn from partial views and distributed features rather than relying on single discriminative patches • Commonly combined with resizing to target input dimensions | |
transforms.RandomResizedCrop(224, scale=(0.8, 1.0)) | • Crops random patch at scale \in [0.8, 1.0] of original area, then resizes to target size• Simulates objects at different distances from camera • More effective than simple cropping as it combines scale and translation invariance | |
A.Affine(translate_percent=0.1, scale=(0.9, 1.1), shear=10) | • Applies translation, scaling, rotation, and shear simultaneously while preserving parallel lines • Defined by 6 parameters in 2 \times 3 matrix• Simulates viewpoint changes without perspective distortion |