Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

πŸ€– Artificial Intelligence
☁️ Cloud and Infrastructure
πŸ’Ύ Data and Databases
πŸ’Ό Professional Skills
🎯 Programming and Development
πŸ”’ Security and Networking
πŸ“š Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
Β© 2026 CheatGridβ„’. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Apache Druid Real-Time Analytics Database Cheat Sheet

Apache Druid Real-Time Analytics Database Cheat Sheet

Back to Data Engineering
Next Topic: Apache Flink Cheat Sheet

Apache Druid is a distributed, column-oriented, real-time analytics database purpose-built for sub-second OLAP queries on large-scale event-driven data. It sits at the intersection of data warehouses, timeseries databases, and search systems β€” combining columnar storage, bitmap indexes, and time-based partitioning with native streaming ingestion from Kafka and Kinesis. The core architectural insight that makes Druid fast is that data is always partitioned by time first, stored in immutable, pre-indexed segment files in deep storage, and optionally pre-aggregated using rollup β€” meaning query work is minimized before the query even arrives.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 113 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture β€” Services and Their RolesTable 2: Druid Server Node Types (Deployment Grouping)Table 3: External DependenciesTable 4: Segments β€” Storage Unit and StructureTable 5: Time Partitioning and Secondary PartitioningTable 6: Ingestion MethodsTable 7: Kafka Supervisor Spec ConfigurationTable 8: Rollup and Aggregation StrategiesTable 9: Query Types β€” Native QueriesTable 10: Druid SQL and Query InterfaceTable 11: Deep Storage and Tiered StorageTable 12: Retention and Data Lifecycle ManagementTable 13: Query CachingTable 14: Query Performance TuningTable 15: Apache Druid vs. Apache Pinot vs. ClickHouse

Table 1: Core Architecture β€” Services and Their Roles

Every Druid service has a distinct, well-separated responsibility; understanding which service does what is the prerequisite for sizing, tuning, and troubleshooting any cluster.

RoleExampleDescription
Broker
druid.service=druid/broker
Query gateway β€” receives queries from clients, fans them out to Historicals and MiddleManagers, then merges partial results and returns a single response.
Historical
druid.service=druid/historical
β€’ Stores and serves immutable segments downloaded from deep storage
β€’ the primary query compute node for finalized data
Coordinator
druid.service=druid/coordinator
Segment lifecycle manager β€” balances segments across Historicals, enforces load/drop retention rules, and triggers compaction.
Overlord
druid.service=druid/overlord
Ingestion controller β€” accepts ingestion tasks, assigns them to MiddleManagers, creates task locks, and coordinates segment publishing.

More in Data Engineering

  • Apache Arrow and PyArrow Cheat Sheet
  • Apache Flink Cheat Sheet
  • Airbyte Open-Source ELT Cheat Sheet
  • Change Data Capture (CDC) Cheat Sheet
  • Databricks Delta Live Tables (DLT) Cheat Sheet
  • Great Expectations Data Quality Cheat Sheet
View all 61 topics in Data Engineering