dlt is an open-source Python library for building ELT data pipelines that extract data from various sources and load it into data warehouses, lakes, or databases. It operates without requiring a separate backend server, running anywhere Python runs—from local notebooks to production orchestrators like Airflow or cloud functions. dlt emphasizes a Python-first, code-as-configuration approach where pipelines are defined using decorators and native Python objects, with automatic schema inference, incremental loading, and state management built in. The key insight: dlt treats data pipelines as portable Python code rather than managed infrastructure, letting data engineers version, test, and deploy pipelines like any other application code while still handling complex concerns like nested JSON normalization, schema evolution, and idempotent incremental syncs automatically.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Pipeline Definition and Execution
| Concept | Example | Description |
|---|---|---|
pipeline = dlt.pipeline( pipeline_name='chess', destination='duckdb', dataset_name='player_data') | Creates a pipeline object that defines destination, dataset name, and pipeline identity; acts as the execution context for all loads. | |
info = pipeline.run( data, table_name='player', write_disposition='append') | Executes the pipeline to extract, normalize, and load data; returns a LoadInfo object with run metadata and metrics. | |
def my_source(): return my_resource() | Decorator that marks a function as a dlt source — a logical grouping of related resources with shared configuration and state. | |
def users(): yield {"id": 1, "name": "Alice"} | Decorator that marks a data-producing function as a resource; resources are the fundamental units of data extraction in dlt. | |
def enrich(item): item['full_name'] = item['first'] + ' ' + item['last'] return item | Decorator for transforming data in-flight during extraction; operates on each item yielded by a resource before loading. |