Azure Data Factory Cheat Sheet

Updated 2026-04-21

Next Topic: Azure Synapse Analytics Cheat Sheet

Azure Data Factory (ADF) is an enterprise-grade, fully managed cloud data integration service used for orchestrating complex ETL (Extract, Transform, Load) and ELT workflows at scale. It provides a serverless execution environment with over 90 built-in connectors, enabling data engineers to construct automated pipelines that ingest data from diverse sources and route it through transformation engines like Mapping Data Flows (Spark-based) and external compute services. Mastering ADF requires a deep understanding of dynamic pipeline parameterization via the Expression Language, selecting the correct Integration Runtime topology for secure data movement, and applying proper CI/CD practices with Git integration and ARM template deployments. While Microsoft now positions Fabric Data Factory as the next-generation platform with new feature investment, ADF remains fully supported and widely deployed in production environments.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 160 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture ComponentsTable 2: Integration Runtimes (IR)Table 3: Pipeline Activities — Control FlowTable 4: Pipeline Activities — Orchestration and UtilitiesTable 5: Pipeline Activities — Data Movement and TransformationTable 6: Pipeline TriggersTable 7: Data Types and FormatsTable 8: Copy Activity Performance SettingsTable 9: Expression Language — System VariablesTable 10: Expression Language — Common FunctionsTable 11: Mapping Data Flow — Row and Schema TransformationsTable 12: Mapping Data Flow — Multiple Stream TransformationsTable 13: Mapping Data Flow — Formatters and ReusabilityTable 14: Security and AuthenticationTable 15: CI/CD and Source ControlTable 16: Monitoring and ManagementTable 17: Deployment and Automation — Azure CLITable 18: Deployment and Automation — PowerShell

Table 1: Core Architecture Components

Everything you build in ADF is assembled from this handful of building blocks. A pipeline groups activities, datasets describe the data they touch, linked services hold the connection secrets, and an integration runtime supplies the compute that actually moves the bytes — understand how these pieces fit together and the rest of the service falls into place.

Component	Example	Description
Pipeline	`{"name": "CopyDataPipeline",` `"properties": {"activities": [...]}}`	Logical grouping of activities that together perform a specific automated task.
Activity	`{"name": "Extract_Sales",` `"type": "Copy", "inputs": [...]}`	A single processing step within a pipeline — handles data movement, transformation, or control flow.
Linked Service	`{"name": "LS_SQLDB", "type": "AzureSqlDatabase",` `"typeProperties": {"connectionString": "..."}}`	Connection definition securely storing authentication details for external data stores or compute targets.
Dataset	`{"type": "AzureBlob",` `"linkedServiceName": {"referenceName": "LS_Blob"}}`	Named view referencing the structure and location of data consumed or produced by activities.
Integration Runtime (IR)	`{"type": "SelfHosted",` `"description": "On-prem IR"}`	Compute infrastructure providing data movement and activity dispatch bridging ADF and data stores.

Table 1: Core Architecture Components

Component	Example	Description
Pipeline	`{"name": "CopyDataPipeline",` `"properties": {"activities": [...]}}`	Logical grouping of activities that together perform a specific automated task.
Activity	`{"name": "Extract_Sales",` `"type": "Copy", "inputs": [...]}`	A single processing step within a pipeline — handles data movement, transformation, or control flow.
Linked Service	`{"name": "LS_SQLDB", "type": "AzureSqlDatabase",` `"typeProperties": {"connectionString": "..."}}`	Connection definition securely storing authentication details for external data stores or compute targets.
Dataset	`{"type": "AzureBlob",` `"linkedServiceName": {"referenceName": "LS_Blob"}}`	Named view referencing the structure and location of data consumed or produced by activities.
Integration Runtime (IR)	`{"type": "SelfHosted",` `"description": "On-prem IR"}`	Compute infrastructure providing data movement and activity dispatch bridging ADF and data stores.