ClickHouse is an open-source columnar database management system designed for real-time analytical processing (OLAP) on massive datasets, developed by Yandex in 2016 and adopted by thousands of organizations for sub-second analytics. Unlike traditional row-oriented databases, ClickHouse's columnar storage and aggressive compression deliver 100× to 1000× performance improvements for aggregation-heavy workloads while handling petabytes of data with linear scalability. The key mental model: MergeTree engines batch writes into immutable parts which merge in the background — understanding this write-once, merge-later architecture is critical for designing schemas, optimizing inserts, and leveraging features like materialized views and projections that apply transformations during merge time rather than query time.
What This Cheat Sheet Covers
This topic spans 35 focused tables and 225 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: MergeTree Engine Family
MergeTree is the heart of ClickHouse—every high-performance table starts here, writing data as immutable parts that merge in the background. The variants each bolt on one extra behavior at merge time: Summing pre-totals additive metrics, Aggregating holds states for complex functions, Replacing deduplicates by version, and the Collapsing family models updates and deletes as paired inserts. Picking the right one up front saves you from fighting the engine later.
| Engine | Example | Description |
|---|---|---|
ENGINE = MergeTree()ORDER BY (date, user_id) | • Base storage engine for high-volume analytical workloads • Columnar storage with sparse primary indexes • Automatic background merge of data parts • Foundation for all variant engines | |
ENGINE = ReplicatedMergeTree( '/clickhouse/tables/{shard}/table', '{replica}') | • Adds automatic replication using ClickHouse Keeper or ZooKeeper • Multiple replicas synchronize via replicated log • Fault-tolerant with no single point of failure | |
ENGINE = SummingMergeTree()ORDER BY (date, key)SUMMING(value) | • Automatically sums numeric columns during merge when rows share same ORDER BY key • Pre-aggregates at write time for fast dashboard queries • Use for additive metrics like counters and totals | |
ENGINE = AggregatingMergeTree()ORDER BY keywith AggregateFunction columns | • Stores partial aggregation states for complex functions (uniq, quantile, groupArray) • Must use -State/-Merge combinators• Enables incremental pre-aggregation for non-additive metrics | |
ENGINE = ReplacingMergeTree(version)ORDER BY id | • Deduplicates rows with same ORDER BY key, keeping latest by version column • Deduplication happens during merge (not immediate) • Use with FINAL modifier for guaranteed deduplication at query time |