dbt (data build tool) is an open-source analytics engineering framework that transforms raw data in warehouses using SQL-first workflows with software engineering best practices. It enables teams to build modular, tested, and documented data pipelines directly in platforms like Snowflake, BigQuery, Databricks, and Redshift, treating transformations as code with version control, CI/CD, and automated testing. The tool's power lies in combining SQL transformations with Jinja templating, enabling dynamic, reusable logic while maintaining complete lineage tracking from raw sources to final models and BI exposures. One key mental model: dbt doesn't move data—it builds SELECT statements that your warehouse executes, making it a transformation-only tool in the modern ELT (Extract-Load-Transform) paradigm.
What This Cheat Sheet Covers
This topic spans 18 focused tables and 126 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Project Structure and Model Layers
| Component | Example | Description |
|---|---|---|
name: analyticsprofile: prodmodels: analytics: materialized: table | • Root configuration file defining project name, profile target, version, model paths, and default configs • required in every dbt project. | |
analytics: target: dev outputs: dev: type: snowflake database: DEV_DB | • Connection credentials stored in ~/.dbt/ defining how dbt connects to your warehouse• specifies database, schema, and authentication. | |
SELECT *FROM {{ source('erp', 'orders') }}WHERE _fivetran_deleted = FALSE | • First transformation layer that cleans, renames, and casts raw source data • one staging model per source table • prefix with stg_. | |
SELECT order_id, SUM(line_total) AS order_totalFROM {{ ref('stg_order_lines') }}GROUP BY 1 | • Purpose-built models that break complex logic into modular steps • not exposed to end users • typically ephemeral • prefix with int_. |