Image segmentation is the computer vision task of partitioning every pixel of an image into a meaningful category β it sits at the heart of autonomous driving, medical imaging, and robotics. Unlike object detection, which draws boxes, segmentation produces pixel-precise masks, making the choice of model architecture directly tied to whether you need class labels (semantic), individual object identities (instance), or both (panoptic). The field has evolved from hand-crafted FCNs through encoder-decoder CNNs (U-Net, DeepLabV3+) and two-stage detectors (Mask R-CNN) to today's foundation models like SAM and SAM 2; the critical insight is that no single model dominates all tasks β trade-offs between accuracy, speed, and the type of segmentation required always drive the selection.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 93 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Segmentation Task Types
Segmentation divides into three distinct problem formulations that differ in what they require a model to know about each pixel. Understanding which task you are solving is the first decision, because it determines which architectures, metrics, and dataset formats are appropriate.
| Type | Example | Description |
|---|---|---|
Every car pixel β class car; every sky pixel β class sky | Assigns a single class label to every pixel; multiple objects of the same class share one label β no instance distinction. Handles "stuff" (amorphous regions) well. | |
Person 1 β mask ID 1; person 2 β mask ID 2 | Detects and delineates each countable object ("things") individually, even within the same class; assigns unique instance IDs. |