ClickHouse Real-Time OLAP Database Cheat Sheet

Updated 2026-05-15

Next Topic: CockroachDB Distributed SQL Database Cheat Sheet

ClickHouse is an open-source columnar database management system designed for real-time analytical processing (OLAP) on massive datasets, developed by Yandex in 2016 and adopted by thousands of organizations for sub-second analytics. Unlike traditional row-oriented databases, ClickHouse's columnar storage and aggressive compression deliver 100× to 1000× performance improvements for aggregation-heavy workloads while handling petabytes of data with linear scalability. The key mental model: MergeTree engines batch writes into immutable parts which merge in the background — understanding this write-once, merge-later architecture is critical for designing schemas, optimizing inserts, and leveraging features like materialized views and projections that apply transformations during merge time rather than query time.

What This Cheat Sheet Covers

This topic spans 35 focused tables and 225 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: MergeTree Engine FamilyTable 2: Primary Key and Sorting Key DesignTable 3: Materialized Views for Real-Time Pre-AggregationTable 4: Aggregation FunctionsTable 5: Array Data Type and Higher-Order FunctionsTable 6: Distributed Tables and ShardingTable 7: Replication with ClickHouse KeeperTable 8: Compression Codecs (CODEC)Table 9: TTL Expressions for Data Lifecycle ManagementTable 10: Query Optimization (PREWHERE and Index Granularity)Table 11: INSERT Performance and Async InsertsTable 12: ClickHouse Cloud and Separation of Storage/ComputeTable 13: Data Skipping IndexesTable 14: JOINs and Join AlgorithmsTable 15: Partitioning StrategyTable 16: Mutations (ALTER UPDATE/DELETE) and Lightweight DeletesTable 17: Projections (Alternative Sort Orders)Table 18: Sampling for Approximate QueriesTable 19: Dictionaries for Fast LookupsTable 20: Window FunctionsTable 21: Settings for Query OptimizationTable 22: System Tables for MonitoringTable 23: User Management and RBACTable 24: Kafka Integration for StreamingTable 25: Backup and Restore StrategiesTable 26: Data Formats for Import/ExportTable 27: SimpleAggregateFunction and Combinator PatternsTable 28: LowCardinality Data Type OptimizationTable 29: Table Functions for External DataTable 30: View Engine and UNION ALLTable 31: Buffer Table EngineTable 32: Subqueries and Common Table Expressions (CTEs)Table 33: Nullable Data Type and NULL HandlingTable 34: GROUP BY Modifiers (WITH TOTALS, ROLLUP, CUBE)Table 35: Background Merge Settings

Table 1: MergeTree Engine Family

MergeTree is the heart of ClickHouse—every high-performance table starts here, writing data as immutable parts that merge in the background. The variants each bolt on one extra behavior at merge time: Summing pre-totals additive metrics, Aggregating holds states for complex functions, Replacing deduplicates by version, and the Collapsing family models updates and deletes as paired inserts. Picking the right one up front saves you from fighting the engine later.

Engine	Example	Description
MergeTree	`ENGINE = MergeTree()` `ORDER BY (date, user_id)`	• Base storage engine for high-volume analytical workloads • Columnar storage with sparse primary indexes • Automatic background merge of data parts • Foundation for all variant engines
ReplicatedMergeTree	`ENGINE = ReplicatedMergeTree(` `'/clickhouse/tables/{shard}/table',` `'{replica}')`	• Adds automatic replication using ClickHouse Keeper or ZooKeeper • Multiple replicas synchronize via replicated log • Fault-tolerant with no single point of failure
SummingMergeTree	`ENGINE = SummingMergeTree()` `ORDER BY (date, key)` `SUMMING(value)`	• Automatically sums numeric columns during merge when rows share same ORDER BY key • Pre-aggregates at write time for fast dashboard queries • Use for additive metrics like counters and totals
AggregatingMergeTree	`ENGINE = AggregatingMergeTree()` `ORDER BY key` with `AggregateFunction` columns	• Stores partial aggregation states for complex functions (uniq, quantile, groupArray) • Must use `-State`/`-Merge` combinators • Enables incremental pre-aggregation for non-additive metrics
ReplacingMergeTree	`ENGINE = ReplacingMergeTree(version)` `ORDER BY id`	• Deduplicates rows with same ORDER BY key, keeping latest by version column • Deduplication happens during merge (not immediate) • Use with `FINAL` modifier for guaranteed deduplication at query time

Table 1: MergeTree Engine Family

Engine	Example	Description
MergeTree	`ENGINE = MergeTree()` `ORDER BY (date, user_id)`	• Base storage engine for high-volume analytical workloads • Columnar storage with sparse primary indexes • Automatic background merge of data parts • Foundation for all variant engines
ReplicatedMergeTree	`ENGINE = ReplicatedMergeTree(` `'/clickhouse/tables/{shard}/table',` `'{replica}')`	• Adds automatic replication using ClickHouse Keeper or ZooKeeper • Multiple replicas synchronize via replicated log • Fault-tolerant with no single point of failure
SummingMergeTree	`ENGINE = SummingMergeTree()` `ORDER BY (date, key)` `SUMMING(value)`	• Automatically sums numeric columns during merge when rows share same ORDER BY key • Pre-aggregates at write time for fast dashboard queries • Use for additive metrics like counters and totals
AggregatingMergeTree	`ENGINE = AggregatingMergeTree()` `ORDER BY key` with `AggregateFunction` columns	• Stores partial aggregation states for complex functions (uniq, quantile, groupArray) • Must use `-State`/`-Merge` combinators • Enables incremental pre-aggregation for non-additive metrics
ReplacingMergeTree	`ENGINE = ReplacingMergeTree(version)` `ORDER BY id`	• Deduplicates rows with same ORDER BY key, keeping latest by version column • Deduplication happens during merge (not immediate) • Use with `FINAL` modifier for guaranteed deduplication at query time