Trino is a distributed SQL query engine designed for interactive analytics on large datasets across heterogeneous data sources. Originally created at Facebook as Presto and later rebranded as Trino, it enables federated queries where you can join data from multiple data sources (databases, data lakes, object storage) through a single SQL interface without moving the data. Trino's MPP (massively parallel processing) architecture separates compute from storage, making it ideal for modern data lakehouse architectures. Key mental model: Trino doesn't store data — it's a query engine that coordinates distributed execution across worker nodes, pushing down operations to data sources whenever possible and pulling only necessary data into memory for processing.
What This Cheat Sheet Covers
This topic spans 25 focused tables and 156 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture Components
| Component | Example | Description |
|---|---|---|
Single node coordinating query execution | • Parses, analyzes, optimizes, and schedules queries • manages worker nodes and client connections • single point of failure without external HA setup. | |
Multiple nodes executing query tasks | • Process data and execute tasks assigned by coordinator • fetch data from connectors and perform computation • horizontally scalable for increased throughput. | |
Hive, Iceberg, PostgreSQL connectors | • Plugin that provides interface to specific data source • translates Trino operations to native source operations • enables data source abstraction. |