Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency data processing over unbounded and bounded data streams. Operating at the heart of real-time data pipelines since its Apache Software Foundation graduation in 2014, Flink delivers exactly-once processing semantics and event-time semantics that handle out-of-order events with precision. Unlike batch-first frameworks retrofitted for streaming, Flink was architected from the ground up for continuous computation—meaning stateful operators, time-based windows, and fault tolerance via distributed snapshots aren't afterthoughts but core primitives that enable applications to run for months without human intervention while processing trillions of events.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 147 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: DataStream API Core Transformations
| Operation | Example | Description |
|---|---|---|
stream.map(x -> x * 2) | • Applies a function to each element and returns exactly one output element per input • one-to-one transformation. | |
stream.flatMap((x, out) -> { for (String word : x.split(" ")) out.collect(word);}) | • Applies a function that can produce zero, one, or multiple output elements per input • commonly used for splitting or filtering with expansion. | |
stream.filter(x -> x > 0) | • Keeps only elements where the predicate returns true • selectively passes records through the stream. | |
stream.keyBy(event -> event.userId) | Partitions the stream by a key selector, creating a KeyedStream where all elements with the same key are routed to the same parallel instance for stateful operations. | |
keyedStream.reduce((a, b) -> a + b) | • Incrementally combines elements with the same key using an associative and commutative function • stateful aggregation without explicit windows. |