Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Computer Vision Cheat Sheet

Computer Vision Cheat Sheet

Back to AI and Machine Learning
Updated 2026-04-28
Next Topic: Convolutional Neural Networks (CNNs) Cheat Sheet

Computer Vision is a field of artificial intelligence that enables machines to interpret and understand visual information from the world—images, videos, and camera streams. It powers applications from autonomous vehicles to medical imaging, bridging perception and decision-making. At its core, Computer Vision combines convolutional neural networks (CNNs), vision transformers, classical image processing, and foundation models to extract features, detect objects, and segment scenes. One critical insight: the choice of architecture and preprocessing directly determines whether your model generalizes to real-world variations in lighting, occlusion, and scale—clean training data and appropriate augmentation are not optional extras but foundational requirements.


What This Cheat Sheet Covers

This topic spans 20 focused tables and 199 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Vision ArchitecturesTable 2: Object Detection MethodsTable 3: Image Segmentation TechniquesTable 4: Image Classification & Transfer LearningTable 5: CNN Building BlocksTable 6: Activation FunctionsTable 7: Data Augmentation TechniquesTable 8: Loss Functions for Computer VisionTable 9: Optimization & Training TechniquesTable 10: Evaluation MetricsTable 11: Image PreprocessingTable 12: Classical Feature ExtractionTable 13: Pose Estimation & Keypoint DetectionTable 14: Face Recognition & DetectionTable 15: 3D Computer Vision & Depth EstimationTable 16: Motion & Video AnalysisTable 17: Image Enhancement & RestorationTable 18: Model Interpretation & VisualizationTable 19: Advanced Training TechniquesTable 20: Specialized Applications

Table 1: Core Vision Architectures

ArchitectureExampleDescription
Convolutional Neural Network (CNN)
Conv2D(32, (3,3)) → ReLU → MaxPool2D(2,2)
• Feedforward network using convolutional filters to extract spatial hierarchies of features
• the foundation of modern computer vision.
ResNet (Residual Network)
x + F(x)
• Introduces skip connections that enable training very deep networks (50–152 layers) by mitigating vanishing gradients
• backbone for many tasks.
EfficientNet
Compound scaling: depth + width + resolution
• Systematically scales network depth, width, and input resolution together using neural architecture search
• state-of-the-art accuracy/efficiency trade-off.
Vision Transformer (ViT)
image → patches → self-attention
• Applies transformer self-attention directly to image patches
• excels with large datasets, bypasses convolutional inductive bias.
Swin Transformer
patches → local windows → shifted windows
• Hierarchical ViT using shifted window self-attention for cross-window interaction; linear complexity with image size
• ICCV 2021 best paper; dominant backbone for detection and segmentation.
ConvNeXt
4×4 patchify → depthwise Conv7×7 → LayerNorm → GELU
• Modernizes ResNet with transformer-inspired design choices (large kernels, LayerNorm, inverted bottleneck)
• matches ViT performance while retaining CNN efficiency and hardware friendliness.

More in AI and Machine Learning

  • CatBoost Cheat Sheet
  • Convolutional Neural Networks (CNNs) Cheat Sheet
  • AI Bias & Fairness Cheat Sheet
  • Feature Engineering Cheat Sheet
  • MLflow Cheat Sheet
  • PyTorch Cheat Sheet
View all 83 topics in AI and Machine Learning