Dagster Cheat Sheet

Updated 2026-04-12

Next Topic: Data Catalog and Metadata Management Cheat Sheet

🧠Study flashcards on this topic109 cards · spaced repetition→

Dagster is a modern data orchestration platform designed around software-defined assets—a declarative approach where data pipelines are modeled as first-class objects rather than task-based workflows. Originally developed to address limitations in traditional orchestrators like Airflow, it provides data-aware orchestration with built-in observability, type-checking, and testing capabilities. Core to Dagster's philosophy is treating data assets (tables, files, models) as the primary abstraction rather than tasks, enabling automatic lineage tracking, easier debugging, and a more intuitive mental model for data engineers. The framework supports both asset-based and op-based (task-based) workflows, though assets are recommended for most use cases as they provide superior observability and composability out of the box.

What This Cheat Sheet Covers

This topic spans 16 focused tables and 116 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Asset DefinitionsTable 2: Asset Dependencies and Graph StructureTable 3: Partitions and BackfillsTable 4: Jobs, Schedules, and SensorsTable 5: Resources and IO ManagersTable 6: Testing PatternsTable 7: Ops, Jobs, and Graphs (Task-Based Primitives)Table 8: Declarative Automation and FreshnessTable 9: Configuration and Run ContextTable 10: Dagster Cloud and DeploymentTable 11: Advanced Asset PatternsTable 12: Integrations and EcosystemTable 13: Debugging and ObservabilityTable 14: Error Handling and RetriesTable 15: CLI and Local DevelopmentTable 16: GraphQL API and Extensions

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core Asset Definitions

Everything in Dagster starts with the asset—a function decorated to declare the data object it produces. These rows walk through the building blocks you'll reach for daily: the @asset decorator itself, ways to declare dependencies and inputs/outputs, and the metadata, owners, groups, and descriptions that make an asset graph readable and accountable instead of just runnable.

Concept	Example	Description
@asset decorator	`@dg.asset` `def customers():` `return pd.read_csv("data.csv")`	• Defines a software-defined asset — a Python function that computes and persists data • the asset key is derived from the function name.
Asset dependencies	`@dg.asset(deps=[raw_customers])` `def clean_customers():`	• Declares upstream dependencies using `deps` — Dagster ensures parent assets run first • use when upstream asset isn't used as function input.
AssetIn	`@dg.asset` `def process(data: AssetIn("source")):`	Explicitly configures input behavior for an upstream asset — allows custom partition mappings, metadata, or key overrides.
AssetOut	`@dg.asset(outs={"a": AssetOut(), "b": AssetOut()})` `def multi(): yield Output(val, "a")`	Defines multiple outputs from a single asset function — each output is tracked as a separate asset with distinct metadata.
Asset materialization	`dg.materialize([customers, orders])`	The act of executing an asset's function and persisting results to storage — can be triggered via UI, CLI, schedules, or sensors.
External assets	`@dg.external_asset(key="s3_data")` `def upstream(): pass`	Models assets produced outside Dagster (e.g., by Airflow or manual processes) — allows lineage tracking without assuming orchestration control.

Table 1: Core Asset Definitions

Concept	Example	Description
@asset decorator	`@dg.asset` `def customers():` `return pd.read_csv("data.csv")`	• Defines a software-defined asset — a Python function that computes and persists data • the asset key is derived from the function name.
Asset dependencies	`@dg.asset(deps=[raw_customers])` `def clean_customers():`	• Declares upstream dependencies using `deps` — Dagster ensures parent assets run first • use when upstream asset isn't used as function input.
AssetIn	`@dg.asset` `def process(data: AssetIn("source")):`	Explicitly configures input behavior for an upstream asset — allows custom partition mappings, metadata, or key overrides.
AssetOut	`@dg.asset(outs={"a": AssetOut(), "b": AssetOut()})` `def multi(): yield Output(val, "a")`	Defines multiple outputs from a single asset function — each output is tracked as a separate asset with distinct metadata.
Asset materialization	`dg.materialize([customers, orders])`	The act of executing an asset's function and persisting results to storage — can be triggered via UI, CLI, schedules, or sensors.
External assets	`@dg.external_asset(key="s3_data")` `def upstream(): pass`	Models assets produced outside Dagster (e.g., by Airflow or manual processes) — allows lineage tracking without assuming orchestration control.