Google BigQuery for Data Engineering Cheat Sheet_v1_tables

Next Topic: Great Expectations Data Quality Cheat Sheet

Google BigQuery is a fully managed, serverless data warehouse built for large-scale analytics. For data engineers, it offers a rich set of features spanning pricing models, storage optimization, ingestion pipelines, security controls, transformation frameworks, and operational tooling. This cheat sheet covers every major capability you need to design, build, and operate production BigQuery workloads.

What This Cheat Sheet Covers

This topic spans 16 focused tables and 101 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Pricing Models — On-Demand vs Capacity-BasedTable 2: BigQuery Editions Feature ComparisonTable 3: Partitioned TablesTable 4: Clustered TablesTable 5: External Tables, BigLake, and Federated QueriesTable 6: Materialized ViewsTable 7: Data Ingestion — Storage Write API vs Legacy StreamingTable 8: Change Data Capture (CDC)Table 9: Continuous QueriesTable 10: Scheduled Queries and BigQuery Studio NotebooksTable 11: User-Defined Functions (UDFs)Table 12: Dataform for SQL TransformationsTable 13: Row-Level and Column-Level SecurityTable 14: Authorized Views and DatasetsTable 15: INFORMATION_SCHEMA and MetadataTable 16: Time Travel, Table Snapshots, and Storage Optimization

Table 1: Pricing Models — On-Demand vs Capacity-Based

BigQuery offers two fundamentally different billing approaches. On-demand charges per byte scanned and suits sporadic workloads, while capacity-based pricing purchases dedicated slots and suits high-throughput, predictable workloads. Choosing the right model has a large impact on total cost.

Model	Example	Description
On-Demand Pricing	`-- $6.25 per TiB scanned` `-- 2,000 concurrent slots per project` `SELECT * FROM large_table`	• Charges $6.25 per TiB of bytes processed • no upfront commitment • default model for new projects
BigQuery Editions (Capacity)	`-- Standard: $0.04/slot-hr` `-- Enterprise: $0.06/slot-hr` `-- Enterprise Plus: $0.10/slot-hr`	• Slot-hour billing replaces per-byte charges • choose Standard, Enterprise, or Enterprise Plus based on feature needs and SLO requirements
Standard Edition	`-- Max 1,600 baseline slots` `-- Autoscale only (no baseline)` `-- $0.04/slot-hr`	• Entry-level capacity edition • supports autoscaling but not baseline slot reservations • no cross-region disaster recovery
Enterprise Edition	`-- Baseline + autoscale slots` `-- BigQuery Omni supported` `-- $0.06/slot-hr`	Adds baseline slot reservations, multi-region replication, BigQuery Omni, and 99.9% SLO over Standard.

Table 1: Pricing Models — On-Demand vs Capacity-Based

Model	Example	Description
On-Demand Pricing	`-- $6.25 per TiB scanned` `-- 2,000 concurrent slots per project` `SELECT * FROM large_table`	• Charges $6.25 per TiB of bytes processed • no upfront commitment • default model for new projects
BigQuery Editions (Capacity)	`-- Standard: $0.04/slot-hr` `-- Enterprise: $0.06/slot-hr` `-- Enterprise Plus: $0.10/slot-hr`	• Slot-hour billing replaces per-byte charges • choose Standard, Enterprise, or Enterprise Plus based on feature needs and SLO requirements
Standard Edition	`-- Max 1,600 baseline slots` `-- Autoscale only (no baseline)` `-- $0.04/slot-hr`	• Entry-level capacity edition • supports autoscaling but not baseline slot reservations • no cross-region disaster recovery
Enterprise Edition	`-- Baseline + autoscale slots` `-- BigQuery Omni supported` `-- $0.06/slot-hr`	Adds baseline slot reservations, multi-region replication, BigQuery Omni, and 99.9% SLO over Standard.