Amazon Redshift Cheat Sheet

Updated 2026-04-12

🧠Study flashcards on this topic97 cards · spaced repetition→

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service developed by AWS that enables fast, cost-effective analytics on massive datasets using columnar storage and massively parallel processing (MPP) architecture. Designed for OLAP workloads, it seamlessly integrates with the broader AWS ecosystem—querying data lakes via Spectrum, streaming from Kinesis, and federating to RDS—while offering both provisioned clusters for predictable workloads and serverless for on-demand elasticity. A critical mental model: Redshift distributes data across compute nodes using distribution keys and sorts it using sort keys; choosing these wisely is the single most impactful optimization you can make, as poor choices force costly data shuffles across the network during query execution.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 104 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Cluster Architecture and Node TypesTable 2: Distribution StylesTable 3: Sort KeysTable 4: Data Loading and UnloadingTable 5: Redshift Spectrum and External TablesTable 6: Materialized ViewsTable 7: Performance Tuning TechniquesTable 8: Workload Management (WLM)Table 9: Concurrency and Query OptimizationTable 10: VACUUM and ANALYZE MaintenanceTable 11: Compression EncodingsTable 12: Security FeaturesTable 13: System Tables and MonitoringTable 14: Redshift Serverless vs ProvisionedTable 15: Advanced Features

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Cluster Architecture and Node Types

Every Redshift query flows through the same two-tier shape: a leader node that plans and coordinates, and compute nodes—each carved into slices—that do the parallel work. Knowing what each piece does, and where RA3's separation of compute from managed storage fits versus the older DC2 nodes, is the foundation for sizing a cluster and reasoning about why a query is fast or slow.

Component	Example	Description
Leader node	`Cluster with 1 leader + N compute nodes`	Receives queries from client applications, parses SQL, develops execution plans, and coordinates parallel query execution across compute nodes.
Compute node	`dc2.large`, `ra3.xlplus`	• Executes query portions assigned by the leader node • each has dedicated CPU, memory, and disk storage • stores data and performs computations.
Node slice	`16 slices on ra3.4xlarge`	• A partition of a compute node's memory and disk space • each slice processes a portion of the workload in parallel • the unit of data distribution.
RA3 nodes	`ra3.xlplus`, `ra3.4xlarge`	• Modern node type with managed storage that scales compute and storage independently • uses high-performance SSDs for hot data and S3 for cold data • recommended for most workloads.

Table 1: Cluster Architecture and Node Types

Component	Example	Description
Leader node	`Cluster with 1 leader + N compute nodes`	Receives queries from client applications, parses SQL, develops execution plans, and coordinates parallel query execution across compute nodes.
Compute node	`dc2.large`, `ra3.xlplus`	• Executes query portions assigned by the leader node • each has dedicated CPU, memory, and disk storage • stores data and performs computations.
Node slice	`16 slices on ra3.4xlarge`	• A partition of a compute node's memory and disk space • each slice processes a portion of the workload in parallel • the unit of data distribution.
RA3 nodes	`ra3.xlplus`, `ra3.4xlarge`	• Modern node type with managed storage that scales compute and storage independently • uses high-performance SSDs for hot data and S3 for cold data • recommended for most workloads.