Apache Airflow Cheat Sheet

Updated 2026-05-28

Next Topic: Apache Arrow and PyArrow Cheat Sheet

🧠Study flashcards on this topic162 cards · spaced repetition→

Apache Airflow is a Python-based platform for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). Originally developed at Airbnb and open-sourced in 2015, Airflow has become the de facto standard for data pipeline orchestration, now used by over 80,000 organizations with 30+ million monthly downloads. Airflow 3.0 (released April 2025) introduced the most significant changes in the platform's history: a fully rewritten React-based UI, built-in DAG versioning, Data Assets (renamed from Datasets), an Edge Executor for distributed/remote execution, a client-server Task Execution Interface, and Deadline Alerts replacing the removed SLA feature. The platform's core model—tasks as discrete units of work in a DAG, with explicit dependencies—scales from simple ETL pipelines to complex multi-team ML/AI platforms orchestrating thousands of workflows.

What This Cheat Sheet Covers

This topic spans 28 focused tables and 249 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: DAG Configuration ParametersTable 2: DAG VersioningTable 3: Scheduling Patterns and PresetsTable 4: Core OperatorsTable 5: SensorsTable 6: Hooks and ConnectionsTable 7: Task Dependencies and RelationshipsTable 8: XCom (Cross-Communication)Table 9: Executor TypesTable 10: Trigger RulesTable 11: Dynamic Task MappingTable 12: TaskFlow API and DecoratorsTable 13: Assets and Data-Aware SchedulingTable 14: DAG BundlesTable 15: Task GroupsTable 16: Deferrable Operators and TriggersTable 17: CLI CommandsTable 18: Monitoring and AlertingTable 19: Deadline AlertsTable 20: Error Handling and RetriesTable 21: Templating and MacrosTable 22: Branching and Conditional LogicTable 23: Performance TuningTable 24: Security and AuthenticationTable 25: Logging ConfigurationTable 26: Callbacks and NotificationsTable 27: Variables and ConfigurationTable 28: Backfilling and Reprocessing

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: DAG Configuration Parameters

Every Airflow DAG is defined by a set of parameters that control its identity, schedule, retry behavior, and operational characteristics. Understanding which parameters belong at the DAG level (apply to all tasks by default) versus at the task level is fundamental to authoring predictable pipelines.

Parameter	Example	Description
dag_id	`dag_id='daily_etl_pipeline'`	• Unique identifier for the DAG • must be unique across all DAGs in the same Airflow instance
schedule	`schedule='@daily'` `schedule='0 6 * * *'`	• Defines when the DAG runs • accepts cron expressions, timedelta, timetable objects, Asset lists, or `None` for manual-only.
start_date	`start_date=datetime(2026, 1, 1)`	• First logical date from which DAG runs can be scheduled • should be timezone-aware and a static past date.
catchup	`catchup=False`	• If `True`, schedules all missed runs between `start_date` and now • defaults to `False` in Airflow 3—critical to set explicitly to avoid surprises.
default_args	`default_args={'retries': 2,` `'retry_delay': timedelta(minutes=5)}`	Dictionary of default parameters applied to all tasks in the DAG unless overridden at the task level.
max_active_runs	`max_active_runs=3`	• Maximum number of concurrent DAG runs allowed • prevents resource exhaustion when a DAG is scheduled frequently.
tags	`tags=['production', 'finance', 'etl']`	• List of string labels for categorizing and filtering DAGs in the UI • useful for organizing large deployments.

Table 1: DAG Configuration Parameters

Parameter	Example	Description
dag_id	`dag_id='daily_etl_pipeline'`	• Unique identifier for the DAG • must be unique across all DAGs in the same Airflow instance
schedule	`schedule='@daily'` `schedule='0 6 * * *'`	• Defines when the DAG runs • accepts cron expressions, timedelta, timetable objects, Asset lists, or `None` for manual-only.
start_date	`start_date=datetime(2026, 1, 1)`	• First logical date from which DAG runs can be scheduled • should be timezone-aware and a static past date.
catchup	`catchup=False`	• If `True`, schedules all missed runs between `start_date` and now • defaults to `False` in Airflow 3—critical to set explicitly to avoid surprises.
default_args	`default_args={'retries': 2,` `'retry_delay': timedelta(minutes=5)}`	Dictionary of default parameters applied to all tasks in the DAG unless overridden at the task level.
max_active_runs	`max_active_runs=3`	• Maximum number of concurrent DAG runs allowed • prevents resource exhaustion when a DAG is scheduled frequently.
tags	`tags=['production', 'finance', 'etl']`	• List of string labels for categorizing and filtering DAGs in the UI • useful for organizing large deployments.