Apache Paimon is an open-source lake format designed to build a Streaming Lakehouse β a unified storage layer for both real-time streaming and large-scale batch analytics on cheap object storage. Originating as Flink Table Store inside the Apache Flink project, it graduated as its own Apache top-level project in 2024 and is now used in production at petabyte scale. The key architectural insight is Paimon's combination of LSM-tree (Log-Structured Merge Tree) storage β borrowed from RocksDB β with a data lake format, enabling high-throughput streaming upserts, multiple merge engines, and complete changelog production in a single system that Spark, Trino, Hive, and Flink can all query directly.
What This Cheat Sheet Covers
This topic spans 21 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture Concepts
The foundational building blocks of Paimon β snapshots, partitions, buckets, and the LSM tree β determine how data is physically stored, how commits are isolated, and how both streaming reads and batch reads work. Understanding these before any other Paimon concept prevents fundamental misunderstandings about parallelism and consistency.
| Concept | Example | Description |
|---|---|---|
-- each bucket = one LSM tree-- L0 files flushed per checkpoint-- compacted to higher levels over time | β’ Core storage structure for primary key tables β’ each bucket contains its own LSM tree whose L0 files are flushed per Flink checkpoint and compacted into higher levels by universal compaction | |
SELECT * FROM my_table$snapshots; | β’ Immutable point-in-time commit β’ every write creates at most two snapshots, using a two-phase commit protocol β’ Enables time travel, incremental reads, and changelog queries | |
CREATE TABLE t (dt STRING, ...) PARTITIONED BY (dt); | β’ Hive-compatible horizontal slice of a table β’ data files for each partition live in their own directory, enabling efficient partition pruning at query time | |
'bucket' = '8' | β’ Smallest unit of read and write parallelism β’ bucket number sets the maximum processing parallelism. Each bucket holds one LSM tree and its changelog files | |
-- L0: one file = one sorted run-- L1+: one level = one sorted run | β’ An ordered set of data files within a level β’ reads must merge all sorted runs, so too many sorted runs hurt read performance and trigger compaction |