Apache Druid Real-Time Analytics Database Cheat Sheet_v1_tables

Apache Druid is a distributed, column-oriented, real-time analytics database purpose-built for sub-second OLAP queries on large-scale event-driven data. It sits at the intersection of data warehouses, timeseries databases, and search systems — combining columnar storage, bitmap indexes, and time-based partitioning with native streaming ingestion from Kafka and Kinesis. The core architectural insight that makes Druid fast is that data is always partitioned by time first, stored in immutable, pre-indexed segment files in deep storage, and optionally pre-aggregated using rollup — meaning query work is minimized before the query even arrives.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 113 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture — Services and Their RolesTable 2: Druid Server Node Types (Deployment Grouping)Table 3: External DependenciesTable 4: Segments — Storage Unit and StructureTable 5: Time Partitioning and Secondary PartitioningTable 6: Ingestion MethodsTable 7: Kafka Supervisor Spec ConfigurationTable 8: Rollup and Aggregation StrategiesTable 9: Query Types — Native QueriesTable 10: Druid SQL and Query InterfaceTable 11: Deep Storage and Tiered StorageTable 12: Retention and Data Lifecycle ManagementTable 13: Query CachingTable 14: Query Performance TuningTable 15: Apache Druid vs. Apache Pinot vs. ClickHouse

Table 1: Core Architecture — Services and Their Roles

Every Druid service has a distinct, well-separated responsibility; understanding which service does what is the prerequisite for sizing, tuning, and troubleshooting any cluster.

Role	Example	Description
Broker	`druid.service=druid/broker`	Query gateway — receives queries from clients, fans them out to Historicals and MiddleManagers, then merges partial results and returns a single response.
Historical	`druid.service=druid/historical`	• Stores and serves immutable segments downloaded from deep storage • the primary query compute node for finalized data
Coordinator	`druid.service=druid/coordinator`	Segment lifecycle manager — balances segments across Historicals, enforces load/drop retention rules, and triggers compaction.
Overlord	`druid.service=druid/overlord`	Ingestion controller — accepts ingestion tasks, assigns them to MiddleManagers, creates task locks, and coordinates segment publishing.

Table 1: Core Architecture — Services and Their Roles

Every Druid service has a distinct, well-separated responsibility; understanding which service does what is the prerequisite for sizing, tuning, and troubleshooting any cluster.

Role	Example	Description
Broker	`druid.service=druid/broker`	Query gateway — receives queries from clients, fans them out to Historicals and MiddleManagers, then merges partial results and returns a single response.
Historical	`druid.service=druid/historical`	• Stores and serves immutable segments downloaded from deep storage • the primary query compute node for finalized data
Coordinator	`druid.service=druid/coordinator`	Segment lifecycle manager — balances segments across Historicals, enforces load/drop retention rules, and triggers compaction.
Overlord	`druid.service=druid/overlord`	Ingestion controller — accepts ingestion tasks, assigns them to MiddleManagers, creates task locks, and coordinates segment publishing.