Google Cloud Professional Data Engineer Cheat Sheet

The Google Cloud Professional Data Engineer certification validates your ability to design, build, operationalize, secure, and monitor data processing systems on Google Cloud, with a strong emphasis on choosing the right managed service for each workload. The exam runs two hours with roughly 50 to 60 multiple choice and multiple select questions across five areas: designing data processing systems (~22%), ingesting and processing the data (~25%), storing the data (~20%), preparing and using data for analysis (~15%), and maintaining and automating data workloads (~18%). Most questions are scenario based, so the winning answer is usually the Google recommended, fully managed, serverless option that meets the stated requirement at the lowest cost and operational overhead, not the most powerful or the most hands on one. Learn the decision boundaries between the storage and processing services (BigQuery vs Bigtable vs Spanner vs Cloud SQL, Dataflow vs Dataproc, Pub/Sub vs Kafka), because matching the right service to the access pattern is what the exam tests most.

What This Cheat Sheet Covers

This topic spans 20 focused tables and 222 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Designing for Security and ComplianceTable 2: Designing for Reliability and FidelityTable 3: Designing for Flexibility and PortabilityTable 4: Designing Data MigrationsTable 5: Planning the Data PipelinesTable 6: Building Pipelines: Choosing the Processing ServicesTable 7: Building Pipelines: Transformations (Batch, Streaming, Enrichment)Table 8: Deploying and Operationalizing the PipelinesTable 9: Selecting Storage SystemsTable 10: Planning for a Data WarehouseTable 11: Using a Data LakeTable 12: Designing for a Data PlatformTable 13: Preparing Data for VisualizationTable 14: Preparing Data for AI and MLTable 15: Sharing DataTable 16: Optimizing ResourcesTable 17: Designing Automation and RepeatabilityTable 18: Organizing Workloads Based on Business RequirementsTable 19: Monitoring and Troubleshooting ProcessesTable 20: Maintaining Awareness of Failures and Mitigating Impact

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Designing for Security and Compliance

Section 1: Designing data processing systems, task 1.1. Covers how a data engineer designs identity and access, encryption and key management, privacy controls, data residency, regulatory compliance, governed resource architecture, and separated environments on Google Cloud.

Concept	Example	Description
Principle of Least Privilege	Grant `roles/bigquery.dataViewer` on one dataset, not project Editor	Give each principal only the permissions a task needs, at the lowest resource in the hierarchy. • Google's core IAM reflex; broad grants are the default exam trap
Predefined Roles	`roles/bigquery.dataEditor`, `roles/storage.objectViewer`	Granular roles maintained by Google for a specific service. Recommended default over basic or custom roles.
Basic (Primitive) Roles	`roles/owner`, `roles/editor`, `roles/viewer`	Highly permissive legacy roles spanning thousands of permissions across all services. • Avoid in production; use only in test environments when no alternative fits
Service Accounts	A pipeline runs as `etl-sa@project.iam.gserviceaccount.com`	A non-human identity for apps and workloads. Grant it least-privilege roles; never share one across dev and prod.
Default Encryption at Rest	No setup needed; data is AES-256 encrypted automatically	All data at rest is encrypted by Google-owned and Google-managed keys with no action required. • You cannot view, rotate, or audit these keys
Customer-Managed Encryption Keys (CMEK)	Create a key in Cloud KMS, set a BigQuery table or bucket to use it	You own and control keys in Cloud KMS while Google runs the encryption. • Control rotation, location, access, and destruction (crypto-shredding)
Customer-Supplied Encryption Keys (CSEK)	Pass a Base64 AES-256 key with each Cloud Storage request	You supply the raw key on every operation; Google stores only a hash, never the key. • Not to be confused with CMEK, where the key lives in Cloud KMS
Sensitive Data Protection (Cloud DLP)	Inspect a BigQuery table for `EMAIL_ADDRESS` and `CREDIT_CARD_NUMBER` infoTypes	Discovers, classifies, and de-identifies sensitive data inside and outside Google Cloud. The PII strategy service.

Table 1: Designing for Security and Compliance

Concept	Example	Description
Principle of Least Privilege	Grant `roles/bigquery.dataViewer` on one dataset, not project Editor	Give each principal only the permissions a task needs, at the lowest resource in the hierarchy. • Google's core IAM reflex; broad grants are the default exam trap
Predefined Roles	`roles/bigquery.dataEditor`, `roles/storage.objectViewer`	Granular roles maintained by Google for a specific service. Recommended default over basic or custom roles.
Basic (Primitive) Roles	`roles/owner`, `roles/editor`, `roles/viewer`	Highly permissive legacy roles spanning thousands of permissions across all services. • Avoid in production; use only in test environments when no alternative fits
Service Accounts	A pipeline runs as `etl-sa@project.iam.gserviceaccount.com`	A non-human identity for apps and workloads. Grant it least-privilege roles; never share one across dev and prod.
Default Encryption at Rest	No setup needed; data is AES-256 encrypted automatically	All data at rest is encrypted by Google-owned and Google-managed keys with no action required. • You cannot view, rotate, or audit these keys
Customer-Managed Encryption Keys (CMEK)	Create a key in Cloud KMS, set a BigQuery table or bucket to use it	You own and control keys in Cloud KMS while Google runs the encryption. • Control rotation, location, access, and destruction (crypto-shredding)
Customer-Supplied Encryption Keys (CSEK)	Pass a Base64 AES-256 key with each Cloud Storage request	You supply the raw key on every operation; Google stores only a hash, never the key. • Not to be confused with CMEK, where the key lives in Cloud KMS
Sensitive Data Protection (Cloud DLP)	Inspect a BigQuery table for `EMAIL_ADDRESS` and `CREDIT_CARD_NUMBER` infoTypes	Discovers, classifies, and de-identifies sensitive data inside and outside Google Cloud. The PII strategy service.