Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

πŸ€– Artificial Intelligence
☁️ Cloud and Infrastructure
πŸ’Ύ Data and Databases
πŸ’Ό Professional Skills
🎯 Programming and Development
πŸ”’ Security and Networking
πŸ“š Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
Β© 2026 CheatGridβ„’. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Apache Pinot Real-Time OLAP Cheat Sheet

Apache Pinot Real-Time OLAP Cheat Sheet

Back to Data Engineering
Next Topic: AWS Glue Cheat Sheet

Apache Pinot is an open-source distributed OLAP database built for sub-second analytical queries on fresh, large-scale data β€” originally created at LinkedIn and now used at Uber, Stripe, and hundreds of other organizations. Unlike batch-oriented data warehouses, Pinot is engineered around the constraint that user-facing queries must return within tens of milliseconds even at 100,000+ QPS, ingesting from Kafka or other streams with seconds of latency. The key mental model: Pinot trades write flexibility for read performance β€” its rich indexing layer (star-tree, inverted, range, geospatial, vector) is selected at table-design time and baked into immutable columnar segments, so query time is bounded by index lookups rather than full scans.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 99 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Table TypesTable 2: Core Architecture β€” Cluster ComponentsTable 3: Schema Design β€” Field Categories and Data TypesTable 4: Segment Lifecycle β€” Generation, Flush, and PushingTable 5: Real-Time Ingestion β€” Kafka and Stream SourcesTable 6: Indexing β€” Index Types and When to Use ThemTable 7: Star-Tree Index β€” Concepts and ConfigurationTable 8: Upsert TablesTable 9: Hybrid Table β€” Time Boundary and Query RoutingTable 10: Multi-Stage Query Engine (MSE)Table 11: Tenants and Multi-TenancyTable 12: Deep Store OptionsTable 13: Segment Assignment and Routing StrategiesTable 14: Pinot vs. Druid vs. Trino β€” OLAP ComparisonTable 15: Production Operations and Performance Tuning

Table 1: Table Types

Pinot's fundamental storage abstraction is the table, which can be offline (batch), real-time (streaming), or hybrid (both). Understanding which type to choose β€” and how hybrid tables stitch the two together with a time boundary β€” is the starting point for every Pinot deployment.

TypeExampleDescription
Real-time table
"tableType": "REALTIME"
β€’ Ingests data from a stream (Kafka, Pulsar, Kinesis)
β€’ builds segments from consumed messages in memory, then flushes to disk periodically as completed segments.
Offline table
"tableType": "OFFLINE"
β€’ Loads pre-built segments pushed from external batch processes (Spark, Hadoop, CLI)
β€’ no streaming consumer β€” suited for historical data with long retention

More in Data Engineering

  • Apache Paimon Streaming Lakehouse Cheat Sheet
  • AWS Glue Cheat Sheet
  • Airbyte Open-Source ELT Cheat Sheet
  • Change Data Capture (CDC) Cheat Sheet
  • Databricks Delta Live Tables (DLT) Cheat Sheet
  • Great Expectations Data Quality Cheat Sheet
View all 61 topics in Data Engineering