Data Vault is a data modeling methodology designed for building scalable, flexible, and auditable enterprise data warehouses. Created by Dan Linstedt in the 1990s and formalized as Data Vault 2.0 in 2013, the methodology separates business keys, relationships, and descriptive attributes into distinct table types—Hubs, Links, and Satellites—enabling parallel loading, incremental development, and minimal impact from source system changes. Data Vault 2.1 extends the methodology with enhanced support for semi-structured data, ontologies and taxonomies, and alignment with modern architectures like Data Mesh and Data Lakehouse. Unlike traditional dimensional modeling, Data Vault prioritizes adaptability, compliance, and auditability, making it ideal for environments requiring strict lineage tracking, regulatory compliance, and continuous integration of new data sources.
What This Cheat Sheet Covers
This topic spans 18 focused tables and 128 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Entity Types
| Entity | Example | Description |
|---|---|---|
HUB_CUSTOMERcustomer_hk (PK)customer_id (BK)load_daterecord_source | • Stores unique business keys for core business concepts (e.g., Customer, Product, Order) • contains no descriptive attributes, only identifiers and metadata. | |
LINK_ORDER_CUSTOMERorder_customer_hk (PK)customer_hk (FK)order_hk (FK)load_daterecord_source | • Captures relationships between Hubs • represents associations or transactions (many-to-many by default) • hash key derived from related Hub business keys. | |
SAT_CUSTOMER_DETAILScustomer_hk (FK)load_date (PK)first_namelast_nameemailhashdiff | • Stores descriptive attributes and full history for Hubs or Links • every change creates a new record • includes load timestamp and hashdiff for change detection. |