Data Warehousing Cheat Sheet

Updated 2026-04-29

Data warehousing is the process of collecting, organizing, and storing large volumes of data from multiple sources into a centralized repository optimized for analytical processing and business intelligence. Unlike transactional databases (OLTP) designed for real-time operations, data warehouses are purpose-built for historical analysis, complex queries, and reporting, enabling organizations to make data-driven decisions. The core design principle involves dimensional modeling—organizing data into fact tables (measurements) and dimension tables (descriptive context)—typically implemented through star or snowflake schemas. Understanding schema design, slowly changing dimensions, and ETL/ELT patterns is essential for building scalable, performant data warehouses; in 2026, modern architectures like lakehouses, medallion layers, and open table formats have extended the classical warehouse paradigm significantly.

What This Cheat Sheet Covers

This topic spans 19 focused tables and 94 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Schema Design PatternsTable 2: Fact Table TypesTable 3: Dimension TypesTable 4: Slowly Changing Dimensions (SCD)Table 5: Measures and AdditivityTable 6: Key Types and StrategiesTable 7: ETL vs ELT ApproachesTable 8: Data Warehouse Architecture LayersTable 9: Data Modeling MethodologiesTable 10: Granularity and AggregationTable 11: Query Optimization TechniquesTable 12: Analytical OperationsTable 13: OLAP vs OLTP SystemsTable 14: Bridge Tables and Complex RelationshipsTable 15: Data Quality and LineageTable 16: Advanced Dimension PatternsTable 17: Metadata TypesTable 18: Modern Data Architecture ParadigmsTable 19: Open Lakehouse Table Formats

Table 1: Schema Design Patterns

The shape you give your tables is the first big decision in any warehouse, and it sets the trade-off between query speed and data redundancy. Star and snowflake are the classic dimensional layouts; Data Vault and 3NF favor auditability and normalization—knowing which fits your team and workload is half the battle.

Pattern	Example	Description
Star Schema	Fact table (Sales) linked directly to denormalized dimensions (Product, Customer, Time)	• Central fact table surrounded by dimension tables • fast query performance due to fewer joins • most widely adopted pattern.
Snowflake Schema	Normalized dimension tables with sub-dimensions: Product → Category → Subcategory	• Dimensions normalized into multiple related tables • reduces data redundancy but increases join complexity and query time.
Galaxy Schema (Fact Constellation)	Multiple fact tables (Sales, Inventory) sharing common dimensions (Time, Product, Location)	• Multiple fact tables share conformed dimensions • supports cross-process analysis like comparing sales and inventory trends.

Table 1: Schema Design Patterns

Pattern	Example	Description
Star Schema	Fact table (Sales) linked directly to denormalized dimensions (Product, Customer, Time)	• Central fact table surrounded by dimension tables • fast query performance due to fewer joins • most widely adopted pattern.
Snowflake Schema	Normalized dimension tables with sub-dimensions: Product → Category → Subcategory	• Dimensions normalized into multiple related tables • reduces data redundancy but increases join complexity and query time.
Galaxy Schema (Fact Constellation)	Multiple fact tables (Sales, Inventory) sharing common dimensions (Time, Product, Location)	• Multiple fact tables share conformed dimensions • supports cross-process analysis like comparing sales and inventory trends.