Great Expectations (GX) is an open-source data quality framework for Python that enables teams to validate, document, and profile data pipelines through declarative Expectations—assertions about data that can be tested automatically. It supports Pandas, Spark, and SQL backends, integrating seamlessly into orchestration tools like Airflow, dbt, and Databricks. The framework distinguishes itself through auto-generated Data Docs (human-readable validation reports), reusable Expectation Suites, and a Checkpoint-based execution model that triggers validation and post-validation actions. A key mental model: Expectations are unit tests for data—specific, versioned, and executable—designed to catch quality issues before they propagate downstream.
What This Cheat Sheet Covers
This topic spans 12 focused tables and 89 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Expectation Categories
| Type | Example | Description |
|---|---|---|
expect_column_values_to_not_be_null(column="age") | Evaluates row-by-row condition for a single column; returns success if mostly parameter threshold met (e.g., 95% non-null). | |
expect_column_mean_to_be_between(column="price", min_value=10, max_value=100) | Computes single aggregate metric (mean, std, distinct count) for a column; validates against min/max bounds. | |
expect_table_row_count_to_be_between(min_value=1000, max_value=50000) | Validates dataset-level properties like row count, column presence, or column order; operates on entire table. | |
expect_column_pair_values_a_to_be_greater_than_b(column_A="end_date", column_B="start_date") | Compares two columns row-by-row; checks relationships like greater-than, equality, or set membership across paired values. |