Model evaluation is the systematic process of assessing machine learning model performance using quantitative metrics, validation strategies, and diagnostic techniques. It bridges the gap between training and deployment by answering whether a model generalizes well to unseen data rather than merely memorizing training patterns. The fundamental tension in evaluation is the bias-variance tradeoff: models must be complex enough to capture real patterns but simple enough to avoid fitting noise, and proper evaluation separates good models from dangerously overconfident ones.
What This Cheat Sheet Covers
This topic spans 24 focused tables and 123 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Data Splitting Strategies
| Method | Example | Description |
|---|---|---|
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | • Divides dataset into separate training and test sets (commonly 70-30 or 80-20) • fast but high variance on small datasets • test set used only once at the end. | |
train 60%, validation 20%, test 20% | • Adds validation set for hyperparameter tuning • prevents test-set contamination from tuning decisions • validation guides model selection, test estimates final performance. | |
train_test_split(X, y, test_size=0.2, stratify=y) | • Maintains class distribution proportions across splits • critical for imbalanced datasets to ensure minority classes appear in both sets. |