The Google Cloud Professional Data Engineer certification validates your ability to design, build, operationalize, secure, and monitor data processing systems on Google Cloud, with a strong emphasis on choosing the right managed service for each workload. The exam runs two hours with roughly 50 to 60 multiple choice and multiple select questions across five areas: designing data processing systems (~22%), ingesting and processing the data (~25%), storing the data (~20%), preparing and using data for analysis (~15%), and maintaining and automating data workloads (~18%). Most questions are scenario based, so the winning answer is usually the Google recommended, fully managed, serverless option that meets the stated requirement at the lowest cost and operational overhead, not the most powerful or the most hands on one. Learn the decision boundaries between the storage and processing services (BigQuery vs Bigtable vs Spanner vs Cloud SQL, Dataflow vs Dataproc, Pub/Sub vs Kafka), because matching the right service to the access pattern is what the exam tests most.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 222 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Designing for Security and Compliance
Section 1: Designing data processing systems, task 1.1. Covers how a data engineer designs identity and access, encryption and key management, privacy controls, data residency, regulatory compliance, governed resource architecture, and separated environments on Google Cloud.
| Concept | Example | Description |
|---|---|---|
Grant roles/bigquery.dataViewer on one dataset, not project Editor | Give each principal only the permissions a task needs, at the lowest resource in the hierarchy. • Google's core IAM reflex; broad grants are the default exam trap | |
roles/bigquery.dataEditor, roles/storage.objectViewer | Granular roles maintained by Google for a specific service. Recommended default over basic or custom roles. | |
roles/owner, roles/editor, roles/viewer | Highly permissive legacy roles spanning thousands of permissions across all services. • Avoid in production; use only in test environments when no alternative fits | |
A pipeline runs as etl-sa | A non-human identity for apps and workloads. Grant it least-privilege roles; never share one across dev and prod. | |
No setup needed; data is AES-256 encrypted automatically | All data at rest is encrypted by Google-owned and Google-managed keys with no action required. • You cannot view, rotate, or audit these keys | |
Create a key in Cloud KMS, set a BigQuery table or bucket to use it | You own and control keys in Cloud KMS while Google runs the encryption. • Control rotation, location, access, and destruction (crypto-shredding) | |
Pass a Base64 AES-256 key with each Cloud Storage request | You supply the raw key on every operation; Google stores only a hash, never the key. • Not to be confused with CMEK, where the key lives in Cloud KMS | |
Inspect a BigQuery table for EMAIL_ADDRESS and CREDIT_CARD_NUMBER infoTypes | Discovers, classifies, and de-identifies sensitive data inside and outside Google Cloud. The PII strategy service. |