Apache Paimon Streaming Lakehouse Cheat Sheet

Updated 2026-05-23

Next Topic: Apache Pinot Real-Time OLAP Cheat Sheet_v1_tables

Apache Paimon is an open-source lake format designed to build a Streaming Lakehouse — a unified storage layer for both real-time streaming and large-scale batch analytics on cheap object storage. Originating as Flink Table Store inside the Apache Flink project, it graduated as its own Apache top-level project in 2024 and is now used in production at petabyte scale. The key architectural insight is Paimon's combination of LSM-tree (Log-Structured Merge Tree) storage — borrowed from RocksDB — with a data lake format, enabling high-throughput streaming upserts, multiple merge engines, and complete changelog production in a single system that Spark, Trino, Hive, and Flink can all query directly.

What This Cheat Sheet Covers

This topic spans 21 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture ConceptsTable 2: Table TypesTable 3: Bucket Modes (Data Distribution)Table 4: Merge EnginesTable 5: Aggregation Functions (Aggregation Merge Engine)Table 6: Partial Update — Sequence GroupsTable 7: Changelog ProducersTable 8: Table Modes (MOR / COW / MOW)Table 9: Compaction StrategiesTable 10: Snapshot and Tag ManagementTable 11: Flink Integration PatternsTable 12: Spark ConnectorTable 13: Trino and Hive ConnectorsTable 14: Catalog TypesTable 15: Schema EvolutionTable 16: Streaming Reads — Scan ModesTable 17: Iceberg CompatibilityTable 18: File Formats and Storage ConfigurationTable 19: System TablesTable 20: Write Performance TuningTable 21: Comparison with Iceberg and Hudi

Table 1: Core Architecture Concepts

The foundational building blocks of Paimon — snapshots, partitions, buckets, and the LSM tree — determine how data is physically stored, how commits are isolated, and how both streaming reads and batch reads work. Understanding these before any other Paimon concept prevents fundamental misunderstandings about parallelism and consistency.

Concept	Example	Description
LSM Tree (Log-Structured Merge Tree)	`-- each bucket = one LSM tree` `-- L0 files flushed per checkpoint` `-- compacted to higher levels over time`	• Core storage structure for primary key tables • each bucket contains its own LSM tree whose L0 files are flushed per Flink checkpoint and compacted into higher levels by universal compaction
Snapshot	`SELECT * FROM my_table$snapshots;`	• Immutable point-in-time commit • every write creates at most two snapshots, using a two-phase commit protocol • Enables time travel, incremental reads, and changelog queries
Partition	`CREATE TABLE t (dt STRING, ...) PARTITIONED BY (dt);`	• Hive-compatible horizontal slice of a table • data files for each partition live in their own directory, enabling efficient partition pruning at query time
Bucket	`'bucket' = '8'`	• Smallest unit of read and write parallelism • bucket number sets the maximum processing parallelism. Each bucket holds one LSM tree and its changelog files
Sorted Run	`-- L0: one file = one sorted run` `-- L1+: one level = one sorted run`	• An ordered set of data files within a level • reads must merge all sorted runs, so too many sorted runs hurt read performance and trigger compaction

Table 1: Core Architecture Concepts

Concept	Example	Description
LSM Tree (Log-Structured Merge Tree)	`-- each bucket = one LSM tree` `-- L0 files flushed per checkpoint` `-- compacted to higher levels over time`	• Core storage structure for primary key tables • each bucket contains its own LSM tree whose L0 files are flushed per Flink checkpoint and compacted into higher levels by universal compaction
Snapshot	`SELECT * FROM my_table$snapshots;`	• Immutable point-in-time commit • every write creates at most two snapshots, using a two-phase commit protocol • Enables time travel, incremental reads, and changelog queries
Partition	`CREATE TABLE t (dt STRING, ...) PARTITIONED BY (dt);`	• Hive-compatible horizontal slice of a table • data files for each partition live in their own directory, enabling efficient partition pruning at query time
Bucket	`'bucket' = '8'`	• Smallest unit of read and write parallelism • bucket number sets the maximum processing parallelism. Each bucket holds one LSM tree and its changelog files
Sorted Run	`-- L0: one file = one sorted run` `-- L1+: one level = one sorted run`	• An ordered set of data files within a level • reads must merge all sorted runs, so too many sorted runs hurt read performance and trigger compaction