Real-time machine learning pipelines process streaming data with sub-second to millisecond latency, enabling predictions on live events as they occur rather than in batch. Unlike traditional batch-based ML systems that compute features and serve models on static datasets, real-time pipelines integrate continuous feature computation, online inference, and event-driven architectures to deliver ML predictions at the moment they're needed. The critical challenge lies in maintaining feature freshness guarantees, handling out-of-order events, and ensuring exactly-once processing semantics while achieving sub-100ms inference latency at scale. Modern platforms like Apache Kafka, Flink, and specialized feature stores enable practitioners to build pipelines where models consume fresh features from streaming data sources and return predictions in near-real-time β powering applications from fraud detection to recommendation systems where decisions must be made instantly.
What This Cheat Sheet Covers
This topic spans 19 focused tables and 112 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Pipeline Architecture Patterns
Architectural patterns define how batch and streaming data flows interact with ML feature computation and model inference. The choice between Lambda, Kappa, or hybrid architectures determines system complexity, latency characteristics, and operational overhead. Lambda architecture maintains separate batch and streaming layers for the same logic, while Kappa treats everything as a stream. Modern approaches favor Kappa for greenfield projects due to simpler codebases, though Lambda remains common in enterprises with existing batch infrastructure.
| Pattern | Example | Description |
|---|---|---|
Batch layer: daily model training Speed layer: hourly feature updates | Dual-path design with separate batch and streaming layers that merge results; batch layer recomputes complete views periodically while speed layer handles incremental updates with eventual consistency between layers | |
Single Kafka stream β Flink β feature store β inference | Stream-only architecture eliminating batch/speed separation by treating all data as unbounded streams; reprocessing handled by replaying events from retained logs rather than maintaining separate batch code | |
Event sourcing + CQRS + stream processors | Architecture where streaming is the primary paradigm and batch processing is a special case (bounded streams); enables continual learning and low-latency predictions as core design principles |