dlt (data load tool) Cheat Sheet

Back to Data EngineeringUpdated 2026-05-15

dlt is an open-source Python library for building ELT data pipelines that extract data from various sources and load it into data warehouses, lakes, or databases. It operates without requiring a separate backend server, running anywhere Python runs—from local notebooks to production orchestrators like Airflow or cloud functions. dlt emphasizes a Python-first, code-as-configuration approach where pipelines are defined using decorators and native Python objects, with automatic schema inference, incremental loading, and state management built in. The key insight: dlt treats data pipelines as portable Python code rather than managed infrastructure, letting data engineers version, test, and deploy pipelines like any other application code while still handling complex concerns like nested JSON normalization, schema evolution, and idempotent incremental syncs automatically.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Pipeline Definition and ExecutionTable 2: Incremental LoadingTable 3: Data Sources and ExtractionTable 4: Destination ConnectorsTable 5: Schema ManagementTable 6: State Management and PersistenceTable 7: Secrets and Configuration ManagementTable 8: Advanced Resource PatternsTable 9: Deployment and OrchestrationTable 10: Monitoring and DebuggingTable 11: Performance and OptimizationTable 12: Python Ecosystem IntegrationTable 13: Comparison with Other ToolsTable 14: dlt Hub EcosystemTable 15: Common Patterns and Best Practices

Table 1: Pipeline Definition and Execution

Concept	Example	Description
`dlt.pipeline()`	`pipeline = dlt.pipeline(` `pipeline_name='chess',` `destination='duckdb',` `dataset_name='player_data'` `)`	Creates a pipeline object that defines destination, dataset name, and pipeline identity; acts as the execution context for all loads.
`pipeline.run()`	`info = pipeline.run(` `data,` `table_name='player',` `write_disposition='append'` `)`	Executes the pipeline to extract, normalize, and load data; returns a `LoadInfo` object with run metadata and metrics.
`@dlt.source`	`@dlt.source` `def my_source():` `return my_resource()`	Decorator that marks a function as a dlt source — a logical grouping of related resources with shared configuration and state.
`@dlt.resource`	`@dlt.resource` `def users():` `yield {"id": 1, "name": "Alice"}`	Decorator that marks a data-producing function as a resource; resources are the fundamental units of data extraction in dlt.
`@dlt.transformer`	`@dlt.transformer` `def enrich(item):` `item['full_name'] = item['first'] + ' ' + item['last']` `return item`	Decorator for transforming data in-flight during extraction; operates on each item yielded by a resource before loading.

Table 1: Pipeline Definition and Execution

Concept	Example	Description
`dlt.pipeline()`	`pipeline = dlt.pipeline(` `pipeline_name='chess',` `destination='duckdb',` `dataset_name='player_data'` `)`	Creates a pipeline object that defines destination, dataset name, and pipeline identity; acts as the execution context for all loads.
`pipeline.run()`	`info = pipeline.run(` `data,` `table_name='player',` `write_disposition='append'` `)`	Executes the pipeline to extract, normalize, and load data; returns a `LoadInfo` object with run metadata and metrics.
`@dlt.source`	`@dlt.source` `def my_source():` `return my_resource()`	Decorator that marks a function as a dlt source — a logical grouping of related resources with shared configuration and state.
`@dlt.resource`	`@dlt.resource` `def users():` `yield {"id": 1, "name": "Alice"}`	Decorator that marks a data-producing function as a resource; resources are the fundamental units of data extraction in dlt.
`@dlt.transformer`	`@dlt.transformer` `def enrich(item):` `item['full_name'] = item['first'] + ' ' + item['last']` `return item`	Decorator for transforming data in-flight during extraction; operates on each item yielded by a resource before loading.