Azure Data Factory (ADF) is an enterprise-grade, fully managed cloud data integration service used for orchestrating complex ETL (Extract, Transform, Load) and ELT workflows at scale. It provides a serverless execution environment with over 90 built-in connectors, enabling data engineers to construct automated pipelines that ingest data from diverse sources and route it through transformation engines like Mapping Data Flows (Spark-based) and external compute services. Mastering ADF requires a deep understanding of dynamic pipeline parameterization via the Expression Language, selecting the correct Integration Runtime topology for secure data movement, and applying proper CI/CD practices with Git integration and ARM template deployments. While Microsoft now positions Fabric Data Factory as the next-generation platform with new feature investment, ADF remains fully supported and widely deployed in production environments.
What This Cheat Sheet Covers
This topic spans 18 focused tables and 160 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture Components
| Component | Example | Description |
|---|---|---|
{"name": "CopyDataPipeline", "properties": {"activities": [...]}} | Logical grouping of activities that together perform a specific automated task. | |
{"name": "Extract_Sales", "type": "Copy", "inputs": [...]} | A single processing step within a pipeline β handles data movement, transformation, or control flow. | |
{"name": "LS_SQLDB", "type": "AzureSqlDatabase", "typeProperties": {"connectionString": "..."}} | Connection definition securely storing authentication details for external data stores or compute targets. | |
{"type": "AzureBlob", "linkedServiceName": {"referenceName": "LS_Blob"}} | Named view referencing the structure and location of data consumed or produced by activities. | |
{"type": "SelfHosted", "description": "On-prem IR"} | Compute infrastructure providing data movement and activity dispatch bridging ADF and data stores. |