Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Apache Airflow Cheat Sheet

Apache Airflow Cheat Sheet

Back to Data Engineering
Updated 2026-04-12
Next Topic: Apache Arrow and PyArrow Cheat Sheet

Apache Airflow is a Python-based platform for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs). Originally developed at Airbnb and open-sourced in 2015, it has become the de facto standard for data pipeline orchestration across batch, streaming, and machine learning workflows. Airflow's core strength lies in its code-as-configuration approach where workflows are defined in Python, enabling version control, testing, and dynamic generation. The platform operates on the principle that tasks are discrete units of work arranged in a DAG, with dependencies explicitly defined to ensure proper execution order—a model that scales from simple ETL pipelines to complex multi-team data platforms orchestrating thousands of workflows.

What This Cheat Sheet Covers

This topic spans 25 focused tables and 212 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: DAG Configuration ParametersTable 2: Scheduling Patterns and PresetsTable 3: Core OperatorsTable 4: SensorsTable 5: Hooks and ConnectionsTable 6: Task Dependencies and RelationshipsTable 7: XCom (Cross-Communication)Table 8: Executor TypesTable 9: Trigger RulesTable 10: Dynamic Task MappingTable 11: TaskFlow API and DecoratorsTable 12: Datasets and Data-Aware SchedulingTable 13: Task GroupsTable 14: Deferrable Operators and TriggersTable 15: CLI CommandsTable 16: Monitoring and AlertingTable 17: Error Handling and RetriesTable 18: Templating and MacrosTable 19: Branching and Conditional LogicTable 20: Performance TuningTable 21: Security and AuthenticationTable 22: Logging ConfigurationTable 23: Callbacks and NotificationsTable 24: Variables and ConfigurationTable 25: Backfilling and Reprocessing

Table 1: DAG Configuration Parameters

ParameterExampleDescription
dag_id
dag_id='daily_etl_pipeline'
• Unique identifier for the DAG
• must be unique across all DAGs in the same Airflow instance.
schedule
schedule='@daily'
schedule='0 6 * * *'
• Defines when the DAG runs
• accepts cron expressions, timedelta objects, presets (@hourly, @daily, @weekly, @monthly), timetables, or None for manual-only.
start_date
start_date=datetime(2026, 1, 1)
• First logical date from which DAG runs can be scheduled
• should be timezone-aware and typically a static past date.
catchup
catchup=False
• If True, schedules all missed runs between start_date and current date
• if False, only schedules from the current date forward—critical for avoiding backfill on first deploy.
max_active_runs
max_active_runs=3
• Maximum number of concurrent DAG runs allowed
• prevents resource exhaustion when a DAG is scheduled frequently.
default_args
default_args={'retries': 2,
'retry_delay': timedelta(minutes=5)}
Dictionary of default parameters applied to all tasks in the DAG unless overridden at the task level.

More in Data Engineering

  • Amazon Redshift Cheat Sheet
  • Apache Arrow and PyArrow Cheat Sheet
  • Airbyte Open-Source ELT Cheat Sheet
  • Change Data Capture (CDC) Cheat Sheet
  • Databricks Delta Live Tables (DLT) Cheat Sheet
  • Great Expectations Data Quality Cheat Sheet
View all 61 topics in Data Engineering