Delta Live Tables (DLT) — now called Lakeflow Spark Declarative Pipelines (SDP) — is Databricks' framework for building reliable batch and streaming ETL pipelines using a declarative approach in SQL or Python. Instead of writing imperative orchestration code, you declare what each table should contain and let the framework handle execution order, dependency resolution, retries, and infrastructure. The key mental model to carry into every DLT table is that a pipeline is a directed acyclic graph of datasets: changing any node's definition only requires updating that node and letting DLT recompute what's affected — you never manage the execution sequence yourself.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 100 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Dataset Types
Streaming tables and materialized views are the two primary building blocks of every DLT pipeline; understanding which to use — and why — is the first decision any pipeline author makes.
| Type | Example | Description |
|---|---|---|
@dp.table()def orders_bronze(): return spark.readStream.table("raw_orders") | • Processes append-only, incremental data • maintains a streaming checkpoint so only new rows are consumed each run | |
@dp.materialized_view()def orders_agg(): return spark.read.table("orders_silver").groupBy("date").agg(...) | • Pre-computes and stores the result of a query • supports incremental refresh for eligible SQL operations and falls back to full recompute otherwise |