Anomaly detection (also called outlier detection) identifies rare patterns, events, or observations that deviate significantly from normal behavior in data. It's a critical technique across domains: fraud detection in finance, intrusion detection in cybersecurity, fault detection in manufacturing, health monitoring in IoT systems, and quality control in production. Unlike standard supervised classification, anomaly detection often operates with limited or no labeled anomalies — making it uniquely challenging. Methods range from classical statistical tests to modern deep learning architectures, each suited to different data types (tabular, time series, images, graphs) and problem contexts (univariate vs. multivariate, online vs. batch, supervised vs. unsupervised). The core challenge is balancing sensitivity (catching real anomalies) with specificity (avoiding false alarms), especially under class imbalance where anomalies are rare. This cheat sheet covers statistical methods, machine learning algorithms, deep learning architectures, time series techniques, evaluation strategies, and production deployment patterns — providing a practitioner's reference for building, tuning, and deploying anomaly detection systems that remain accurate and interpretable in real-world production environments.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 119 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Statistical Methods for Univariate Anomaly Detection
Statistical methods detect anomalies by measuring how far data points deviate from distribution parameters (mean, median, spread). They work best when data follows known distributions (often Gaussian) and are computationally efficient, interpretable, and require no training. These techniques form the foundation of anomaly detection — simple to implement but powerful when assumptions hold. The 3-sigma rule and IQR are go-to methods for quick exploratory analysis, while formal tests like Grubbs provide statistical rigor for small datasets.
| Method | Example | Description |
|---|---|---|
z = (x - μ) / σif abs(z) > 3: anomaly | Measures how many standard deviations a point is from the mean; values beyond ±3σ (99.7% threshold) are flagged as anomalies. Assumes Gaussian distribution. | |
Q1 = 25th percentileQ3 = 75th percentileIQR = Q3 - Q1if x < Q1-1.5×IQR or x > Q3+1.5×IQR: anomaly | Non-parametric method based on quartiles; detects outliers outside 1.5×IQR below Q1 or above Q3. Robust to non-Gaussian data and extreme values. | |
MAD = median(|x - median(x)|)modified_z = 0.6745 × (x - median) / MADif abs(modified_z) > 3.5: anomaly | Uses median and MAD instead of mean/std; more robust to outliers than standard Z-score. Threshold typically 3.5 for consistency with Z-score. | |
G = max(|x - mean|) / stdCompare G to critical value from Grubbs table | Formal hypothesis test for detecting a single outlier in univariate data; assumes Gaussian distribution. Iterative versions detect multiple outliers sequentially. |