The Medallion Architecture is a layered data design pattern that organizes lakehouse data into Bronze (raw ingestion), Silver (cleaned and conformed), and Gold (business-ready aggregates) layers to incrementally improve data quality and structure. Originating from Databricks best practices, this pattern has become the de facto standard for modern data lakehouses across platforms like Microsoft Fabric, Snowflake, and AWS, enabling teams to build auditable, scalable data pipelines with clear separation of concerns. The key mental model: each layer represents a progressive refinement contract—Bronze preserves raw truth, Silver enforces cleanliness and standardization, and Gold optimizes for consumption, making data quality improvements explicit and reversible rather than buried in opaque transformation logic.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 125 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture Layers
| Layer | Example | Description |
|---|---|---|
bronze.sales_rawingestion_time: 2026-04-12 | • Landing zone for unprocessed data exactly as received from sources • append-only, immutable, schema-on-read • preserves complete audit trail and enables reprocessability. | |
silver.sales_cleanedWHERE is_valid = true | • Refined and validated data with deduplication, type casting, null handling, and standardization • enforces schema-on-write • serves as enterprise-wide cleaned source. | |
gold.sales_by_region_dailySUM(revenue) GROUP BY region, date | • Aggregated, denormalized data optimized for analytics and BI • star schema models, KPIs, feature tables • read-optimized with fewer joins • consumption-ready for dashboards and ML. |