dlt (data load tool) Cheat Sheet

Updated 2026-05-15

Next Topic: ELT Extract Load Transform Cheat Sheet

dlt is an open-source Python library for building ELT data pipelines that extract data from various sources and load it into data warehouses, lakes, or databases. It operates without requiring a separate backend server, running anywhere Python runs—from local notebooks to production orchestrators like Airflow or cloud functions. dlt emphasizes a Python-first, code-as-configuration approach where pipelines are defined using decorators and native Python objects, with automatic schema inference, incremental loading, and state management built in. The key insight: dlt treats data pipelines as portable Python code rather than managed infrastructure, letting data engineers version, test, and deploy pipelines like any other application code while still handling complex concerns like nested JSON normalization, schema evolution, and idempotent incremental syncs automatically.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Pipeline Definition and ExecutionTable 2: Incremental LoadingTable 3: Data Sources and ExtractionTable 4: Destination ConnectorsTable 5: Schema ManagementTable 6: State Management and PersistenceTable 7: Secrets and Configuration ManagementTable 8: Advanced Resource PatternsTable 9: Deployment and OrchestrationTable 10: Monitoring and DebuggingTable 11: Performance and OptimizationTable 12: Python Ecosystem IntegrationTable 13: Comparison with Other ToolsTable 14: dlt Hub EcosystemTable 15: Common Patterns and Best Practices

Table 1: Pipeline Definition and Execution

The handful of building blocks you'll touch in every dlt project. A pipeline ties a source to a destination, the @dlt.source/@dlt.resource decorators turn ordinary Python functions into data producers, and run() drives the whole extract-normalize-load cycle — with write_disposition and primary_key deciding whether new rows append, replace, or merge.

Concept	Example	Description
`dlt.pipeline()`	`pipeline = dlt.pipeline(` `pipeline_name='chess',` `destination='duckdb',` `dataset_name='player_data'` `)`	• Creates a pipeline object that defines destination, dataset name, and pipeline identity • acts as the execution context for all loads
`pipeline.run()`	`info = pipeline.run(` `data,` `table_name='player',` `write_disposition='append'` `)`	• Executes the pipeline to extract, normalize, and load data • returns a `LoadInfo` object with run metadata and metrics
`@dlt.source`	`@dlt.source` `def my_source():` `return my_resource()`	Decorator that marks a function as a dlt source — a logical grouping of related resources with shared configuration and state.
`@dlt.resource`	`@dlt.resource` `def users():` `yield {"id": 1, "name": "Alice"}`	• Decorator that marks a data-producing function as a resource • resources are the fundamental units of data extraction in dlt
`@dlt.transformer`	`@dlt.transformer` `def enrich(item):` `item['full_name'] = item['first'] + ' ' + item['last']` `return item`	• Decorator for transforming data in-flight during extraction • operates on each item yielded by a resource before loading

Table 1: Pipeline Definition and Execution

Concept	Example	Description
`dlt.pipeline()`	`pipeline = dlt.pipeline(` `pipeline_name='chess',` `destination='duckdb',` `dataset_name='player_data'` `)`	• Creates a pipeline object that defines destination, dataset name, and pipeline identity • acts as the execution context for all loads
`pipeline.run()`	`info = pipeline.run(` `data,` `table_name='player',` `write_disposition='append'` `)`	• Executes the pipeline to extract, normalize, and load data • returns a `LoadInfo` object with run metadata and metrics
`@dlt.source`	`@dlt.source` `def my_source():` `return my_resource()`	Decorator that marks a function as a dlt source — a logical grouping of related resources with shared configuration and state.
`@dlt.resource`	`@dlt.resource` `def users():` `yield {"id": 1, "name": "Alice"}`	• Decorator that marks a data-producing function as a resource • resources are the fundamental units of data extraction in dlt
`@dlt.transformer`	`@dlt.transformer` `def enrich(item):` `item['full_name'] = item['first'] + ' ' + item['last']` `return item`	• Decorator for transforming data in-flight during extraction • operates on each item yielded by a resource before loading