Apache Flink Cheat Sheet

Updated 2026-05-28

🧠Study flashcards on this topic152 cards · spaced repetition→

Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency data processing over unbounded and bounded data streams. Operating at the heart of real-time data pipelines since its Apache Software Foundation graduation in 2014, Flink delivers exactly-once processing semantics and event-time semantics that handle out-of-order events with precision. Unlike batch-first frameworks retrofitted for streaming, Flink was architected from the ground up for continuous computation—meaning stateful operators, time-based windows, and fault tolerance via distributed snapshots aren't afterthoughts but core primitives. With Flink 2.0 (March 2025) and subsequent 2.1/2.2 releases, the framework entered a new era: the DataSet API was fully removed, disaggregated state (ForSt) decouples computation from state storage, and native AI integration via ML_PREDICT and VECTOR_SEARCH brings LLM inference directly into SQL pipelines.

What This Cheat Sheet Covers

This topic spans 20 focused tables and 173 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: DataStream API Core TransformationsTable 2: Windowing StrategiesTable 3: State Management and BackendsTable 4: Checkpointing and Fault ToleranceTable 5: Watermarks and Event TimeTable 6: Table API and SQLTable 7: Connectors and Data Sources/SinksTable 8: Deployment Modes and ArchitectureTable 9: Process Functions and Low-Level OperationsTable 10: Join OperationsTable 11: Complex Event Processing (CEP)Table 12: Performance Tuning and OptimizationTable 13: Restart Strategies and Failure HandlingTable 14: Monitoring and MetricsTable 15: PyFlink and Python APITable 16: Configuration and TuningTable 17: Advanced State and Queryable StateTable 18: Testing and DevelopmentTable 19: Managed Flink ServicesTable 20: Apache Flink 2.0+ Features

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: DataStream API Core Transformations

The DataStream API is Flink's primary programming model for building streaming applications. Operators transform one or more DataStreams into new ones; understanding which operator to reach for—and when to use low-level process vs. higher-level map—determines both code clarity and performance.

Operation	Example	Description
map	`stream.map(x -> x * 2)`	Applies a function to each element, returning exactly one output per input — one-to-one transformation.
filter	`stream.filter(x -> x > 0)`	• Keeps only elements where the predicate returns true • selectively passes records through the stream
flatMap	`stream.flatMap((x, out) -> {` `for (String w : x.split(" "))` `out.collect(w);` `})`	Produces zero, one, or many outputs per input — used for splitting, filtering with expansion, or unnesting.
keyBy	`stream.keyBy(event -> event.userId)`	Partitions the stream by a key selector, creating a KeyedStream where all elements with the same key route to the same parallel instance for stateful operations.
reduce	`keyedStream.reduce((a, b) -> a + b)`	Incrementally combines elements with the same key using an associative, commutative function — stateful aggregation without explicit windows.
aggregate	`keyedStream.window(...).aggregate(` `new AverageAggregate())`	Applies a custom aggregation with an accumulator — more flexible than reduce, supports type transformation.
process	`stream.process(new MyProcessFunction())`	Low-level access to state, timers, and side outputs — the most flexible transformation for custom logic.

Table 1: DataStream API Core Transformations

Operation	Example	Description
map	`stream.map(x -> x * 2)`	Applies a function to each element, returning exactly one output per input — one-to-one transformation.
filter	`stream.filter(x -> x > 0)`	• Keeps only elements where the predicate returns true • selectively passes records through the stream
flatMap	`stream.flatMap((x, out) -> {` `for (String w : x.split(" "))` `out.collect(w);` `})`	Produces zero, one, or many outputs per input — used for splitting, filtering with expansion, or unnesting.
keyBy	`stream.keyBy(event -> event.userId)`	Partitions the stream by a key selector, creating a KeyedStream where all elements with the same key route to the same parallel instance for stateful operations.
reduce	`keyedStream.reduce((a, b) -> a + b)`	Incrementally combines elements with the same key using an associative, commutative function — stateful aggregation without explicit windows.
aggregate	`keyedStream.window(...).aggregate(` `new AverageAggregate())`	Applies a custom aggregation with an accumulator — more flexible than reduce, supports type transformation.
process	`stream.process(new MyProcessFunction())`	Low-level access to state, timers, and side outputs — the most flexible transformation for custom logic.