Online Learning and Concept Drift Adaptation Cheat Sheet

Updated 2026-05-02

Next Topic: ONNX and ONNX Runtime Cheat Sheet

🧠Study flashcards on this topic75 cards · spaced repetition→

Online learning and concept drift adaptation represent a paradigm shift from traditional batch machine learning to incremental, real-time model training on streaming data. Unlike batch learning that requires full dataset access, online learning processes data one instance at a time or in small sequential batches, continuously updating model parameters as new observations arrive. Concept drift—the phenomenon where the statistical properties of the target variable change over time—poses a fundamental challenge: models trained on historical data may become obsolete as the underlying data distribution evolves. This makes drift detection and adaptation mechanisms essential for maintaining model accuracy in non-stationary environments like fraud detection, IoT sensor streams, recommendation systems, and real-time analytics. What makes online learning particularly powerful is its ability to adapt to changes without requiring complete retraining, enabling continuous learning with bounded memory and computational resources in production systems where data never stops flowing.

What This Cheat Sheet Covers

This topic spans 13 focused tables and 77 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Online Learning ParadigmsTable 2: Online Learning AlgorithmsTable 3: Concept Drift TypesTable 4: Drift Detection MethodsTable 5: Window Models for StreamingTable 6: Adaptation StrategiesTable 7: Evaluation Methods for StreamsTable 8: Python Frameworks and LibrariesTable 9: Advanced TechniquesTable 10: Performance ConsiderationsTable 11: Real-World ApplicationsTable 12: Integration with Stream ProcessingTable 13: Challenges and Best Practices

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core Online Learning Paradigms

Before reaching for a specific algorithm, you choose how the model consumes the stream at all — one sample at a time, in small mini-batches, or by continuing to fit on new arrivals without ever revisiting old data. These paradigms also frame how you start (cold from scratch or warm from a pretrained model) and how you measure as you go, with test-then-train being the convention that lets you score and learn from the same instance.

Paradigm	Example	Description
Stochastic Gradient Descent (SGD)	`for x, y in stream:` `w = w - lr * gradient(loss(w, x, y))`	• Single-sample updates to model parameters • noisy but memory-efficient • enables true online learning without storing historical data
Mini-batch SGD	`batch = stream.read(32)` `w = w - lr * mean(gradients(batch))`	• Processes small fixed-size batches (typically 16-512 samples) • balances update frequency with gradient stability • reduces variance compared to pure SGD
Incremental learning	`model.partial_fit(X_new, y_new)`	• Model parameters updated with new data arrivals without retraining from scratch • supports both supervised and unsupervised algorithms
Test-then-train (prequential)	`pred = model.predict(x)` `loss = evaluate(pred, y)` `model.update(x, y)`	• Evaluate then update on each instance • provides unbiased performance estimates • standard evaluation protocol for streaming algorithms

Table 1: Core Online Learning Paradigms

Paradigm	Example	Description
Stochastic Gradient Descent (SGD)	`for x, y in stream:` `w = w - lr * gradient(loss(w, x, y))`	• Single-sample updates to model parameters • noisy but memory-efficient • enables true online learning without storing historical data
Mini-batch SGD	`batch = stream.read(32)` `w = w - lr * mean(gradients(batch))`	• Processes small fixed-size batches (typically 16-512 samples) • balances update frequency with gradient stability • reduces variance compared to pure SGD
Incremental learning	`model.partial_fit(X_new, y_new)`	• Model parameters updated with new data arrivals without retraining from scratch • supports both supervised and unsupervised algorithms
Test-then-train (prequential)	`pred = model.predict(x)` `loss = evaluate(pred, y)` `model.update(x, y)`	• Evaluate then update on each instance • provides unbiased performance estimates • standard evaluation protocol for streaming algorithms