Image Segmentation Models Cheat Sheet

Updated 2026-05-21

Next Topic: Imitation Learning and Learning from Demonstrations Cheat Sheet

Image segmentation is the computer vision task of partitioning every pixel of an image into a meaningful category — it sits at the heart of autonomous driving, medical imaging, and robotics. Unlike object detection, which draws boxes, segmentation produces pixel-precise masks, making the choice of model architecture directly tied to whether you need class labels (semantic), individual object identities (instance), or both (panoptic). The field has evolved from hand-crafted FCNs through encoder-decoder CNNs (U-Net, DeepLabV3+) and two-stage detectors (Mask R-CNN) to today's foundation models like SAM and SAM 2; the critical insight is that no single model dominates all tasks — trade-offs between accuracy, speed, and the type of segmentation required always drive the selection.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 93 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Segmentation Task TypesTable 2: U-Net ArchitectureTable 3: Mask R-CNN ArchitectureTable 4: DeepLabV3+ ArchitectureTable 5: SAM — Segment Anything ModelTable 6: SAM 2 — Video and Image SegmentationTable 7: HQ-SAM — High-Quality Segmentation AdapterTable 8: PSPNet and SegFormer — Context Aggregation ArchitecturesTable 9: YOLACT — Real-Time Instance SegmentationTable 10: Evaluation MetricsTable 11: Dataset FormatsTable 12: Libraries and FrameworksTable 13: Fine-Tuning and Adaptation StrategiesTable 14: Common Pitfalls and Best Practices

Table 1: Segmentation Task Types

Segmentation divides into three distinct problem formulations that differ in what they require a model to know about each pixel. Understanding which task you are solving is the first decision, because it determines which architectures, metrics, and dataset formats are appropriate.

Type	Example	Description
Semantic segmentation	Every car pixel → class `car`; every sky pixel → class `sky`	• Assigns a single class label to every pixel • multiple objects of the same class share one label — no instance distinction • Handles "stuff" (amorphous regions) well
Instance segmentation	Person 1 → mask ID 1; person 2 → mask ID 2	• Detects and delineates each countable object ("things") individually, even within the same class • assigns unique instance IDs

Table 1: Segmentation Task Types

Type	Example	Description
Semantic segmentation	Every car pixel → class `car`; every sky pixel → class `sky`	• Assigns a single class label to every pixel • multiple objects of the same class share one label — no instance distinction • Handles "stuff" (amorphous regions) well
Instance segmentation	Person 1 → mask ID 1; person 2 → mask ID 2	• Detects and delineates each countable object ("things") individually, even within the same class • assigns unique instance IDs