Data Vault Cheat Sheet

Updated 2026-04-21

Next Topic: Data Warehousing Cheat Sheet

Data Vault is a data modeling methodology designed for building scalable, flexible, and auditable enterprise data warehouses. Created by Dan Linstedt in the 1990s and formalized as Data Vault 2.0 in 2013, the methodology separates business keys, relationships, and descriptive attributes into distinct table types—Hubs, Links, and Satellites—enabling parallel loading, incremental development, and minimal impact from source system changes. Data Vault 2.1 extends the methodology with enhanced support for semi-structured data, ontologies and taxonomies, and alignment with modern architectures like Data Mesh and Data Lakehouse. Unlike traditional dimensional modeling, Data Vault prioritizes adaptability, compliance, and auditability, making it ideal for environments requiring strict lineage tracking, regulatory compliance, and continuous integration of new data sources.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 128 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Entity TypesTable 2: Specialized Link TypesTable 3: Satellite VariationsTable 4: Business Vault StructuresTable 5: Hash Key PatternsTable 6: Metadata and Audit ColumnsTable 7: Loading PatternsTable 8: Architecture LayersTable 9: Naming ConventionsTable 10: Data Vault vs. Other ApproachesTable 11: Benefits and Use CasesTable 12: Performance OptimizationTable 13: Common ChallengesTable 14: Implementation ToolsTable 15: Best PracticesTable 16: Advanced PatternsTable 17: Testing and ValidationTable 18: Cloud Platform Considerations

Table 1: Core Entity Types

The whole methodology rests on three building blocks—Hubs hold business keys, Links capture the relationships between them, and Satellites carry the descriptive detail and its history. Keeping these concerns in separate tables is what gives Data Vault its parallel loading and resilience to source changes. The Link Satellite and Reference Hub here are natural extensions of that same idea.

Entity	Example	Description
Hub	`HUB_CUSTOMER` `customer_hk (PK)` `customer_id (BK)` `load_date` `record_source`	• Stores unique business keys for core business concepts (e.g., Customer, Product, Order) • contains no descriptive attributes, only identifiers and metadata.
Link	`LINK_ORDER_CUSTOMER` `order_customer_hk (PK)` `customer_hk (FK)` `order_hk (FK)` `load_date` `record_source`	• Captures relationships between Hubs • represents associations or transactions (many-to-many by default) • hash key derived from related Hub business keys.
Satellite	`SAT_CUSTOMER_DETAILS` `customer_hk (FK)` `load_date (PK)` `first_name` `last_name` `email` `hashdiff`	• Stores descriptive attributes and full history for Hubs or Links • every change creates a new record • includes load timestamp and hashdiff for change detection.

Table 1: Core Entity Types

Entity	Example	Description
Hub	`HUB_CUSTOMER` `customer_hk (PK)` `customer_id (BK)` `load_date` `record_source`	• Stores unique business keys for core business concepts (e.g., Customer, Product, Order) • contains no descriptive attributes, only identifiers and metadata.
Link	`LINK_ORDER_CUSTOMER` `order_customer_hk (PK)` `customer_hk (FK)` `order_hk (FK)` `load_date` `record_source`	• Captures relationships between Hubs • represents associations or transactions (many-to-many by default) • hash key derived from related Hub business keys.
Satellite	`SAT_CUSTOMER_DETAILS` `customer_hk (FK)` `load_date (PK)` `first_name` `last_name` `email` `hashdiff`	• Stores descriptive attributes and full history for Hubs or Links • every change creates a new record • includes load timestamp and hashdiff for change detection.