Object Detection Models (YOLO, Faster R-CNN, DETR) Cheat Sheet

Updated 2026-05-21

Next Topic: On-Device LLM Inference Cheat Sheet

Object detection sits at the core of computer vision — it locates and classifies every object of interest in an image, outputting both class labels and bounding box coordinates. The field has split into two dominant families: two-stage detectors like Faster R-CNN that propose regions before classifying them, and one-stage detectors like the YOLO series that predict all boxes in a single forward pass, trading a small accuracy margin for dramatic speed gains. A third paradigm, transformer-based detection (DETR and its descendants), reformulates detection as a set-prediction problem with no anchor heuristics and no NMS. Understanding which family suits a deployment target — and how each model's backbone, neck, and head interact — is the key to getting the most out of any detection pipeline.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 107 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Object Detection ParadigmsTable 2: YOLO Architecture SeriesTable 3: Faster R-CNN and Two-Stage ArchitectureTable 4: DETR and Transformer-Based DetectorsTable 5: Detection Head DesignsTable 6: Backbone NetworksTable 7: Neck ArchitecturesTable 8: Loss FunctionsTable 9: Label Assignment StrategiesTable 10: Post-ProcessingTable 11: Evaluation MetricsTable 12: Annotation FormatsTable 13: Training and Fine-tuning with UltralyticsTable 14: Training with MMDetectionTable 15: Model Export and DeploymentTable 16: Oriented Bounding Box (OBB) DetectionTable 17: Data Augmentation for DetectionTable 18: Common Pitfalls and Solutions

Table 1: Object Detection Paradigms

One-stage vs two-stage vs transformer-based detection represents three fundamentally different approaches to the same problem. Choosing between them depends on your latency budget, accuracy requirements, and whether you need anchor-free simplicity or are comfortable tuning anchor hyperparameters.

Model	Example	Description
One-stage detector	`model = YOLO("yolo26n.pt")` `results = model("img.jpg")`	• Predicts class and box in a single forward pass over a dense grid • faster than two-stage but historically traded slight accuracy for speed
Two-stage detector	Stage 1: RPN → ROI proposals Stage 2: classify + refine ROIs	• Generates region proposals first, then classifies each • higher accuracy on small/dense objects at the cost of latency
Transformer-based detector	`model = RT-DETR()` No NMS, no anchors	• Formulates detection as set prediction using Hungarian matching • eliminates anchors and NMS entirely

Table 1: Object Detection Paradigms

Model	Example	Description
One-stage detector	`model = YOLO("yolo26n.pt")` `results = model("img.jpg")`	• Predicts class and box in a single forward pass over a dense grid • faster than two-stage but historically traded slight accuracy for speed
Two-stage detector	Stage 1: RPN → ROI proposals Stage 2: classify + refine ROIs	• Generates region proposals first, then classifies each • higher accuracy on small/dense objects at the cost of latency
Transformer-based detector	`model = RT-DETR()` No NMS, no anchors	• Formulates detection as set prediction using Hungarian matching • eliminates anchors and NMS entirely