Azure Synapse Analytics is Microsoft's unified analytics platform that combines enterprise data warehousing with big data analytics into a single integrated service. Built on massively parallel processing (MPP) architecture, it enables organizations to ingest, prepare, manage, and analyze large volumes of data from diverse sources using SQL, Spark, and data integration pipelines. The service operates across three primary compute engines—dedicated SQL pools for structured data warehousing, serverless SQL pools for ad-hoc querying without infrastructure provisioning, and Apache Spark pools for big data processing. A critical concept to understand: Synapse distributes data across 60 distributions in dedicated pools, and minimizing data movement between these distributions is the single most important performance optimization— every design decision around distribution keys, table joins, and query patterns should aim to keep related data co-located.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 76 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Compute Resources
| Resource | Example | Description |
|---|---|---|
CREATE TABLE Sales DISTRIBUTION = HASH(CustomerId) | • Provisioned data warehouse with scalable Data Warehouse Units (DWU) • uses MPP architecture with 60 distributions for parallel query processing • requires active management (pause/resume) to control costs. | |
SELECT * FROM OPENROWSET( BULK 'data/*.parquet', FORMAT = 'PARQUET') AS data | • On-demand query service that reads data directly from Azure Storage without loading • pay-per-query based on data processed • ideal for ad-hoc exploration and data lake queries • automatically provisioned with every workspace. | |
spark = SparkSession.builder .appName("DataProc").getOrCreate()df = spark.read.parquet("path") | • Managed Spark clusters for big data processing using Python, Scala, .NET, or SQL • supports Spark 3.4 and 3.5 runtimes • auto-pause capability after inactivity • integrates with Delta Lake for ACID transactions. |