Anomaly Detection in Machine Learning Cheat Sheet

Updated 2026-05-18

Anomaly detection (also called outlier detection) identifies rare patterns, events, or observations that deviate significantly from normal behavior in data. It's a critical technique across domains: fraud detection in finance, intrusion detection in cybersecurity, fault detection in manufacturing, health monitoring in IoT systems, and quality control in production. Unlike standard supervised classification, anomaly detection often operates with limited or no labeled anomalies — making it uniquely challenging. Methods range from classical statistical tests to modern deep learning architectures, each suited to different data types (tabular, time series, images, graphs) and problem contexts (univariate vs. multivariate, online vs. batch, supervised vs. unsupervised). The core challenge is balancing sensitivity (catching real anomalies) with specificity (avoiding false alarms), especially under class imbalance where anomalies are rare. This cheat sheet covers statistical methods, machine learning algorithms, deep learning architectures, time series techniques, evaluation strategies, and production deployment patterns — providing a practitioner's reference for building, tuning, and deploying anomaly detection systems that remain accurate and interpretable in real-world production environments.

What This Cheat Sheet Covers

This topic spans 16 focused tables and 119 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Statistical Methods for Univariate Anomaly DetectionTable 2: Distance-Based Anomaly Detection MethodsTable 3: Density-Based Anomaly Detection MethodsTable 4: Tree-Based and Ensemble MethodsTable 5: Traditional Machine Learning MethodsTable 6: Deep Learning — Autoencoder-Based MethodsTable 7: Deep Learning — Advanced ArchitecturesTable 8: Time Series — Classical & Statistical MethodsTable 9: Time Series — Deep Learning MethodsTable 10: Multivariate & Specialized TechniquesTable 11: Evaluation Metrics for Anomaly DetectionTable 12: Anomaly Types & Key ConceptsTable 13: Production Deployment & MonitoringTable 14: Data Preprocessing & Feature EngineeringTable 15: Python Libraries & ToolsTable 16: Real-World Applications & Use Cases

Table 1: Statistical Methods for Univariate Anomaly Detection

Statistical methods detect anomalies by measuring how far data points deviate from distribution parameters (mean, median, spread). They work best when data follows known distributions (often Gaussian) and are computationally efficient, interpretable, and require no training. These techniques form the foundation of anomaly detection — simple to implement but powerful when assumptions hold. The 3-sigma rule and IQR are go-to methods for quick exploratory analysis, while formal tests like Grubbs provide statistical rigor for small datasets.

Method	Example	Description
Z-Score (Standard Score)	`z = (x - μ) / σ` `if abs(z) > 3: anomaly`	Measures how many standard deviations a point is from the mean; values beyond ±3σ (99.7% threshold) are flagged as anomalies. Assumes Gaussian distribution.
Interquartile Range (IQR)	`Q1 = 25th percentile` `Q3 = 75th percentile` `IQR = Q3 - Q1` `if x < Q1-1.5×IQR or x > Q3+1.5×IQR: anomaly`	Non-parametric method based on quartiles; detects outliers outside 1.5×IQR below Q1 or above Q3. Robust to non-Gaussian data and extreme values.
Modified Z-Score (Median Absolute Deviation)	`MAD = median(\|x - median(x)\|)` `modified_z = 0.6745 × (x - median) / MAD` `if abs(modified_z) > 3.5: anomaly`	Uses median and MAD instead of mean/std; more robust to outliers than standard Z-score. Threshold typically 3.5 for consistency with Z-score.
Grubbs' Test (ESD)	`G = max(\|x - mean\|) / std` Compare G to critical value from Grubbs table	Formal hypothesis test for detecting a single outlier in univariate data; assumes Gaussian distribution. Iterative versions detect multiple outliers sequentially.

Table 1: Statistical Methods for Univariate Anomaly Detection

Method	Example	Description
Z-Score (Standard Score)	`z = (x - μ) / σ` `if abs(z) > 3: anomaly`	Measures how many standard deviations a point is from the mean; values beyond ±3σ (99.7% threshold) are flagged as anomalies. Assumes Gaussian distribution.
Interquartile Range (IQR)	`Q1 = 25th percentile` `Q3 = 75th percentile` `IQR = Q3 - Q1` `if x < Q1-1.5×IQR or x > Q3+1.5×IQR: anomaly`	Non-parametric method based on quartiles; detects outliers outside 1.5×IQR below Q1 or above Q3. Robust to non-Gaussian data and extreme values.
Modified Z-Score (Median Absolute Deviation)	`MAD = median(\|x - median(x)\|)` `modified_z = 0.6745 × (x - median) / MAD` `if abs(modified_z) > 3.5: anomaly`	Uses median and MAD instead of mean/std; more robust to outliers than standard Z-score. Threshold typically 3.5 for consistency with Z-score.
Grubbs' Test (ESD)	`G = max(\|x - mean\|) / std` Compare G to critical value from Grubbs table	Formal hypothesis test for detecting a single outlier in univariate data; assumes Gaussian distribution. Iterative versions detect multiple outliers sequentially.