Machine learning is a subset of artificial intelligence focused on building systems that learn patterns from data without explicit programming. At its core, machine learning involves training mathematical models on historical data to make predictions or decisions on new, unseen data. Understanding the bias-variance tradeoff is fundamental: models must balance the ability to capture complex patterns (low bias) against stability across different datasets (low variance), as optimizing one often degrades the other. As of 2026, gradient boosting frameworks and neural network architectures continue to dominate, while scikit-learn remains the go-to library for classical ML with native support for missing values and categorical features in its histogram-based estimators.
What This Cheat Sheet Covers
This topic spans 21 focused tables and 216 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Learning Paradigms
| Paradigm | Example | Description |
|---|---|---|
model.fit(X_train, y_train) | • Learns from labeled data where each input has a corresponding target output • used for classification and regression tasks. | |
kmeans = KMeans(n_clusters=3)kmeans.fit(X) | • Discovers hidden patterns in unlabeled data without target outputs • used for clustering, dimensionality reduction, and anomaly detection. | |
model.fit(X_labeled, y_labeled)model.predict(X_unlabeled) | Combines small labeled dataset with large unlabeled data to improve learning when labeling is expensive. | |
# predict masked tokensoutput = model(masked_input) | Generates supervisory signals from data itself by creating pretext tasks like masking or predicting next tokens. |