Online learning and concept drift adaptation represent a paradigm shift from traditional batch machine learning to incremental, real-time model training on streaming data. Unlike batch learning that requires full dataset access, online learning processes data one instance at a time or in small sequential batches, continuously updating model parameters as new observations arrive. Concept drift—the phenomenon where the statistical properties of the target variable change over time—poses a fundamental challenge: models trained on historical data may become obsolete as the underlying data distribution evolves. This makes drift detection and adaptation mechanisms essential for maintaining model accuracy in non-stationary environments like fraud detection, IoT sensor streams, recommendation systems, and real-time analytics. What makes online learning particularly powerful is its ability to adapt to changes without requiring complete retraining, enabling continuous learning with bounded memory and computational resources in production systems where data never stops flowing.
What This Cheat Sheet Covers
This topic spans 13 focused tables and 77 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Online Learning Paradigms
Before reaching for a specific algorithm, you choose how the model consumes the stream at all — one sample at a time, in small mini-batches, or by continuing to fit on new arrivals without ever revisiting old data. These paradigms also frame how you start (cold from scratch or warm from a pretrained model) and how you measure as you go, with test-then-train being the convention that lets you score and learn from the same instance.
| Paradigm | Example | Description |
|---|---|---|
for x, y in stream: w = w - lr * gradient(loss(w, x, y)) | • Single-sample updates to model parameters • noisy but memory-efficient • enables true online learning without storing historical data | |
batch = stream.read(32)w = w - lr * mean(gradients(batch)) | • Processes small fixed-size batches (typically 16-512 samples) • balances update frequency with gradient stability • reduces variance compared to pure SGD | |
model.partial_fit(X_new, y_new) | • Model parameters updated with new data arrivals without retraining from scratch • supports both supervised and unsupervised algorithms | |
pred = model.predict(x)loss = evaluate(pred, y)model.update(x, y) | • Evaluate then update on each instance • provides unbiased performance estimates • standard evaluation protocol for streaming algorithms |