Extract, Load, Transform (ELT) is a modern data integration pattern where raw data is extracted from sources, loaded directly into a cloud data warehouse or lakehouse, and then transformed in-place using the warehouse's native compute power. Unlike traditional ETL, which transforms data before loading, ELT shifts transformation downstream, leveraging scalable cloud infrastructure for processing. This approach simplifies pipelines, preserves raw data for flexibility, and enables analysts and data engineers to iteratively refine transformations using SQL-based tools like dbt. ELT has become the foundation of the modern data stack, powering analytics, machine learning, and operational systems. The key mental model: storage is cheap, compute is elastic—load first, transform later.
What This Cheat Sheet Covers
This topic spans 24 focused tables and 189 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Concepts
| Concept | Example | Description |
|---|---|---|
SELECT * FROM source_systemWHERE updated_at > '2026-01-01' | • Retrieves raw data from sources (databases, APIs, SaaS apps, files) without transformation • uses connectors to pull data incrementally or in full. | |
COPY INTO raw.orders FROM s3://bucket/FILE_FORMAT = (TYPE = 'JSON') | • Ingests extracted data directly into the warehouse as-is • no schema enforcement or cleansing at this stage • preserves source fidelity. | |
CREATE TABLE curated.orders ASSELECT order_id, customer_id, SUM(amount) FROM raw.orders GROUP BY 1, 2 | • Converts raw data into analytics-ready models inside the warehouse using SQL • cleans, joins, aggregates, and applies business logic. | |
Snowflake, BigQuery, Redshift | • Modern columnar storage platforms with separated compute and storage • provide massive scalability and parallel processing for ELT workloads. | |
raw.source_name.table | • Initial landing zone for unprocessed source data • often schema-on-read • serves as a durable archive for reprocessing and lineage tracking. |