Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service developed by AWS that enables fast, cost-effective analytics on massive datasets using columnar storage and massively parallel processing (MPP) architecture. Designed for OLAP workloads, it seamlessly integrates with the broader AWS ecosystem—querying data lakes via Spectrum, streaming from Kinesis, and federating to RDS—while offering both provisioned clusters for predictable workloads and serverless for on-demand elasticity. A critical mental model: Redshift distributes data across compute nodes using distribution keys and sorts it using sort keys; choosing these wisely is the single most impactful optimization you can make, as poor choices force costly data shuffles across the network during query execution.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 104 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Cluster Architecture and Node Types
| Component | Example | Description |
|---|---|---|
Cluster with 1 leader + N compute nodes | Receives queries from client applications, parses SQL, develops execution plans, and coordinates parallel query execution across compute nodes. | |
dc2.large, ra3.xlplus | • Executes query portions assigned by the leader node • each has dedicated CPU, memory, and disk storage • stores data and performs computations. | |
16 slices on ra3.4xlarge | • A partition of a compute node's memory and disk space • each slice processes a portion of the workload in parallel • the unit of data distribution. | |
ra3.xlplus, ra3.4xlarge | • Modern node type with managed storage that scales compute and storage independently • uses high-performance SSDs for hot data and S3 for cold data • recommended for most workloads. |