Medallion Architecture Cheat Sheet

Updated 2026-05-28

Next Topic: Prefect Data Orchestration Cheat Sheet

🧠Study flashcards on this topic164 cards · spaced repetition→

The Medallion Architecture is a layered data design pattern — typically Bronze → Silver → Gold — that structures a data lakehouse by progressively refining raw data into trusted, analytics-ready assets. First popularized by Databricks with Delta Lake, the pattern is now format-agnostic and runs on Apache Iceberg, Apache Hudi, and Apache Paimon, making it a universal blueprint for modern data platforms regardless of cloud or vendor. The key mental model is separation of concerns by transformation stage: Bronze preserves raw fidelity, Silver enforces integrity and conformity, and Gold serves specific business consumers — and the boundary between Silver and Gold is best tested by whether a transformation requires domain knowledge (if yes, it belongs in Gold, not Silver).

What This Cheat Sheet Covers

This topic spans 23 focused tables and 193 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture LayersTable 2: Bronze Layer Ingestion PatternsTable 3: Bronze Layer Design PrinciplesTable 4: Silver Layer Transformation StandardsTable 5: Silver Layer Quality EnforcementTable 6: Gold Layer Business AggregationsTable 7: Data Vault and Dimensional Modeling IntegrationTable 8: Semantic LayerTable 9: Schema Evolution HandlingTable 10: Open Table FormatsTable 11: Performance Optimization TechniquesTable 12: Data Quality Layers AssignmentTable 13: Multi-Hop Architecture PatternsTable 14: Tooling and FrameworksTable 15: Platform-Specific ImplementationsTable 16: Access Control and GovernanceTable 17: Common Anti-Patterns and MistakesTable 18: Optimization Best PracticesTable 19: Naming ConventionsTable 20: Testing and Validation PatternsTable 21: Cost Optimization StrategiesTable 22: Monitoring and ObservabilityTable 23: Streaming vs Batch Ingestion Decision

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core Architecture Layers

Understanding what work belongs in each layer is the foundation of a healthy medallion design. Each layer has a distinct contract: Bronze = truth, Silver = quality, Gold = value. Misplacing logic — especially pushing domain knowledge into Silver — is the single most common failure mode.

Layer	Example	Description
Bronze Layer	Raw JSON, CSV, Parquet ingested as-is	• Immutable landing zone for raw data • preserves original fidelity including errors, duplicates, and schema drift • append-only • never transformed
Silver Layer	Deduplicated, type-cast, null-checked records	• Conformed, validated, integrated data across sources • technical transformations only • no business logic requiring domain knowledge
Gold Layer	`fact_orders`, `dim_customer`, `report_revenue_by_region`	• Business-ready aggregations modeled for specific use cases • star schema, wide tables, or OBT patterns • optimized for BI and ML

Table 1: Core Architecture Layers

Layer	Example	Description
Bronze Layer	Raw JSON, CSV, Parquet ingested as-is	• Immutable landing zone for raw data • preserves original fidelity including errors, duplicates, and schema drift • append-only • never transformed
Silver Layer	Deduplicated, type-cast, null-checked records	• Conformed, validated, integrated data across sources • technical transformations only • no business logic requiring domain knowledge
Gold Layer	`fact_orders`, `dim_customer`, `report_revenue_by_region`	• Business-ready aggregations modeled for specific use cases • star schema, wide tables, or OBT patterns • optimized for BI and ML