Azure Synapse Analytics Cheat Sheet

Updated 2026-04-12

🧠Study flashcards on this topic73 cards · spaced repetition→

Azure Synapse Analytics is Microsoft's unified analytics platform that combines enterprise data warehousing with big data analytics into a single integrated service. Built on massively parallel processing (MPP) architecture, it enables organizations to ingest, prepare, manage, and analyze large volumes of data from diverse sources using SQL, Spark, and data integration pipelines. The service operates across three primary compute engines—dedicated SQL pools for structured data warehousing, serverless SQL pools for ad-hoc querying without infrastructure provisioning, and Apache Spark pools for big data processing. A critical concept to understand: Synapse distributes data across 60 distributions in dedicated pools, and minimizing data movement between these distributions is the single most important performance optimization— every design decision around distribution keys, table joins, and query patterns should aim to keep related data co-located.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 76 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Compute ResourcesTable 2: Table Distribution StrategiesTable 3: Indexing StrategiesTable 4: Performance Optimization FeaturesTable 5: Data Loading and External AccessTable 6: Security and Access ControlTable 7: Monitoring and ManagementTable 8: Data Formats and File HandlingTable 9: Integration and ConnectivityTable 10: CI/CD and DevOpsTable 11: SQL Pool Architecture and FeaturesTable 12: Spark Pool CapabilitiesTable 13: Lake DatabaseTable 14: Advanced Scenarios

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core Compute Resources

Synapse separates storage from compute, and the building blocks here are the compute engines you pick between for each workload. Dedicated SQL pools give you a provisioned MPP warehouse, serverless SQL lets you query the data lake on demand and pay per query, and Spark pools handle big-data processing — getting the right engine for the job is the first decision in any Synapse design.

Resource	Example	Description
Dedicated SQL Pool	`CREATE TABLE Sales` `DISTRIBUTION = HASH(CustomerId)`	• Provisioned data warehouse with scalable Data Warehouse Units (DWU) • uses MPP architecture with 60 distributions for parallel query processing • requires active management (pause/resume) to control costs.
Serverless SQL Pool	`SELECT * FROM OPENROWSET(` `BULK 'data/*.parquet',` `FORMAT = 'PARQUET') AS data`	• On-demand query service that reads data directly from Azure Storage without loading • pay-per-query based on data processed • ideal for ad-hoc exploration and data lake queries • automatically provisioned with every workspace.
Apache Spark Pool	`spark = SparkSession.builder` `.appName("DataProc").getOrCreate()` `df = spark.read.parquet("path")`	• Managed Spark clusters for big data processing using Python, Scala, .NET, or SQL • supports Spark 3.4 and 3.5 runtimes • auto-pause capability after inactivity • integrates with Delta Lake for ACID transactions.

Table 1: Core Compute Resources

Resource	Example	Description
Dedicated SQL Pool	`CREATE TABLE Sales` `DISTRIBUTION = HASH(CustomerId)`	• Provisioned data warehouse with scalable Data Warehouse Units (DWU) • uses MPP architecture with 60 distributions for parallel query processing • requires active management (pause/resume) to control costs.
Serverless SQL Pool	`SELECT * FROM OPENROWSET(` `BULK 'data/*.parquet',` `FORMAT = 'PARQUET') AS data`	• On-demand query service that reads data directly from Azure Storage without loading • pay-per-query based on data processed • ideal for ad-hoc exploration and data lake queries • automatically provisioned with every workspace.
Apache Spark Pool	`spark = SparkSession.builder` `.appName("DataProc").getOrCreate()` `df = spark.read.parquet("path")`	• Managed Spark clusters for big data processing using Python, Scala, .NET, or SQL • supports Spark 3.4 and 3.5 runtimes • auto-pause capability after inactivity • integrates with Delta Lake for ACID transactions.