Delta Lake Cheat Sheet

Updated 2026-04-21

Next Topic: dlt (data load tool) Cheat Sheet

Delta Lake is an open-source storage framework that brings ACID transactions, scalable metadata handling, and time travel to cloud data lakes. Built on top of Parquet, it provides a transactional layer through an append-only commit log (_delta_log) that records every change, enabling reliable concurrent writes and schema evolution without sacrificing performance. Originally developed by Databricks and now a Linux Foundation project, Delta Lake bridges the reliability gap between data warehouses and data lakes, making it the foundation for modern lakehouse architectures on AWS S3, Azure ADLS, and Google Cloud Storage. As of Delta Lake 4.2 (April 2026), the project supports catalog-managed tables, the Variant data type for semi-structured data, and universal format interoperability with Iceberg and Hudi.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 129 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core ConceptsTable 2: Read and Write OperationsTable 3: Table Manipulation (DML)Table 4: Schema ManagementTable 5: Time Travel and VersioningTable 6: OptimizationTable 7: Change Data CaptureTable 8: Concurrency and IsolationTable 9: Partitioning StrategiesTable 10: Table CloningTable 11: ConstraintsTable 12: Advanced FeaturesTable 13: Storage ConfigurationTable 14: InteroperabilityTable 15: Table PropertiesTable 16: SQL DDL CommandsTable 17: Delta Lake vs Alternatives

Table 1: Core Concepts

Almost everything Delta does that a plain Parquet directory can't trace back to the transaction log — the append-only _delta_log that records every commit and makes ACID guarantees, time travel, and concurrent writes possible. Understanding checkpoints, protocol versions, and table features here pays off across every other table, because they all describe machinery built on top of that log.

Concept	Example	Description
Transaction log	`_delta_log/00000000000000000000.json`	• Append-only JSON log that records every table change • each commit creates a new numbered log file, enabling ACID guarantees and time travel
ACID transactions	Multiple writers commit simultaneously	• Atomicity, Consistency, Isolation, Durability guarantees via optimistic concurrency control • failed transactions roll back without affecting committed data
Parquet data files	`part-00000-<uuid>.snappy.parquet`	• Columnar storage format containing actual data • Delta adds metadata layer on top for transactions and versioning
Checkpoint	`_delta_log/00000000000000000010.checkpoint.parquet`	• Parquet snapshot of table state written every 10 commits (default) • accelerates metadata reads by avoiding replay of thousands of JSON log entries
Table protocol version	`minReaderVersion=3, minWriterVersion=7`	• Protocol defines minimum client capabilities required to read/write a table • higher versions unlock features like deletion vectors and column mapping

Table 1: Core Concepts

Concept	Example	Description
Transaction log	`_delta_log/00000000000000000000.json`	• Append-only JSON log that records every table change • each commit creates a new numbered log file, enabling ACID guarantees and time travel
ACID transactions	Multiple writers commit simultaneously	• Atomicity, Consistency, Isolation, Durability guarantees via optimistic concurrency control • failed transactions roll back without affecting committed data
Parquet data files	`part-00000-<uuid>.snappy.parquet`	• Columnar storage format containing actual data • Delta adds metadata layer on top for transactions and versioning
Checkpoint	`_delta_log/00000000000000000010.checkpoint.parquet`	• Parquet snapshot of table state written every 10 commits (default) • accelerates metadata reads by avoiding replay of thousands of JSON log entries
Table protocol version	`minReaderVersion=3, minWriterVersion=7`	• Protocol defines minimum client capabilities required to read/write a table • higher versions unlock features like deletion vectors and column mapping