Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Amazon Redshift Cheat Sheet

Amazon Redshift Cheat Sheet

Back to Data Engineering
Updated 2026-04-12
Next Topic: Apache Airflow Cheat Sheet

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service developed by AWS that enables fast, cost-effective analytics on massive datasets using columnar storage and massively parallel processing (MPP) architecture. Designed for OLAP workloads, it seamlessly integrates with the broader AWS ecosystem—querying data lakes via Spectrum, streaming from Kinesis, and federating to RDS—while offering both provisioned clusters for predictable workloads and serverless for on-demand elasticity. A critical mental model: Redshift distributes data across compute nodes using distribution keys and sorts it using sort keys; choosing these wisely is the single most impactful optimization you can make, as poor choices force costly data shuffles across the network during query execution.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 104 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Cluster Architecture and Node TypesTable 2: Distribution StylesTable 3: Sort KeysTable 4: Data Loading and UnloadingTable 5: Redshift Spectrum and External TablesTable 6: Materialized ViewsTable 7: Performance Tuning TechniquesTable 8: Workload Management (WLM)Table 9: Concurrency and Query OptimizationTable 10: VACUUM and ANALYZE MaintenanceTable 11: Compression EncodingsTable 12: Security FeaturesTable 13: System Tables and MonitoringTable 14: Redshift Serverless vs ProvisionedTable 15: Advanced Features

Table 1: Cluster Architecture and Node Types

ComponentExampleDescription
Leader node
Cluster with 1 leader + N compute nodes
Receives queries from client applications, parses SQL, develops execution plans, and coordinates parallel query execution across compute nodes.
Compute node
dc2.large, ra3.xlplus
• Executes query portions assigned by the leader node
• each has dedicated CPU, memory, and disk storage
• stores data and performs computations.
Node slice
16 slices on ra3.4xlarge
• A partition of a compute node's memory and disk space
• each slice processes a portion of the workload in parallel
• the unit of data distribution.
RA3 nodes
ra3.xlplus, ra3.4xlarge
• Modern node type with managed storage that scales compute and storage independently
• uses high-performance SSDs for hot data and S3 for cold data
• recommended for most workloads.

More in Data Engineering

  • Airbyte Open-Source ELT Cheat Sheet
  • Apache Airflow Cheat Sheet
  • Apache Arrow and PyArrow Cheat Sheet
  • Change Data Capture (CDC) Cheat Sheet
  • Databricks Delta Live Tables (DLT) Cheat Sheet
  • Great Expectations Data Quality Cheat Sheet
View all 61 topics in Data Engineering