Delta Lake Cheat Sheet

Updated 2026-04-20

Delta Lake is an open-source storage framework that brings ACID transactions, scalable metadata handling, and time travel to cloud data lakes. Built on top of Parquet, it provides a transactional layer through an append-only commit log (_delta_log) that records every change, enabling reliable concurrent writes and schema evolution without sacrificing performance. Originally developed by Databricks and now a Linux Foundation project, Delta Lake has reached version 4.2.0 (on Apache Spark 4.1.0) and serves as the foundation for modern lakehouse architectures across AWS S3, Azure ADLS, and Google Cloud Storage.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 129 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core ConceptsTable 2: Read and Write OperationsTable 3: Table ManipulationTable 4: Schema ManagementTable 5: Time Travel and VersioningTable 6: OptimizationTable 7: Change Data CaptureTable 8: Concurrency and IsolationTable 9: Table CloningTable 10: ConstraintsTable 11: Advanced FeaturesTable 12: Partitioning StrategiesTable 13: Storage ConfigurationTable 14: InteroperabilityTable 15: Table PropertiesTable 16: SQL DDL CommandsTable 17: Delta Lake vs Alternatives

Table 1: Core Concepts

Everything else in Delta Lake rests on the handful of building blocks here — the transaction log that records every change, the Parquet files holding the actual data, and the protocol versions and table features that decide which clients can read or write a table. Once you see how the log, checkpoints, and the lakehouse model fit together, the rest of the cheat sheet reads far more naturally.

Concept	Example	Description
Transaction log	`_delta_log/00000000000000000000.json`	• Append-only JSON log that records every table change • each commit creates a new log file numbered sequentially, enabling ACID guarantees and time travel
ACID transactions	Multiple writers commit simultaneously	• Atomicity, Consistency, Isolation, Durability via optimistic concurrency control • failed transactions roll back without affecting committed data
Parquet data files	`part-00000-<uuid>.snappy.parquet`	• Columnar storage format containing actual data • Delta adds metadata layer on top for transactions and versioning
Checkpoint	`_delta_log/00000000000000000010.checkpoint.parquet`	• Parquet snapshot of table state written every 10 commits (default) • accelerates metadata reads by avoiding replay of thousands of JSON log entries • v2 checkpoints available via `delta.checkpointPolicy = 'v2'`
Table protocol version	`minReaderVersion=3, minWriterVersion=7`	• Protocol defines minimum client capabilities required to read/write a table • higher versions unlock features; supports table features model for granular opt-in

Table 1: Core Concepts

Concept	Example	Description
Transaction log	`_delta_log/00000000000000000000.json`	• Append-only JSON log that records every table change • each commit creates a new log file numbered sequentially, enabling ACID guarantees and time travel
ACID transactions	Multiple writers commit simultaneously	• Atomicity, Consistency, Isolation, Durability via optimistic concurrency control • failed transactions roll back without affecting committed data
Parquet data files	`part-00000-<uuid>.snappy.parquet`	• Columnar storage format containing actual data • Delta adds metadata layer on top for transactions and versioning
Checkpoint	`_delta_log/00000000000000000010.checkpoint.parquet`	• Parquet snapshot of table state written every 10 commits (default) • accelerates metadata reads by avoiding replay of thousands of JSON log entries • v2 checkpoints available via `delta.checkpointPolicy = 'v2'`
Table protocol version	`minReaderVersion=3, minWriterVersion=7`	• Protocol defines minimum client capabilities required to read/write a table • higher versions unlock features; supports table features model for granular opt-in