DuckDB is an in-process columnar SQL analytics database designed for fast analytical queries on local data without server overhead. Unlike traditional databases, DuckDB executes directly within your application (like SQLite but optimized for OLAP workloads), making it ideal for data science workflows, ETL pipelines, and interactive analytics on laptops or in production. A key differentiator is zero-copy integration with Parquet, CSV, Arrow, and Pandas — DuckDB can query files and data structures directly without importing them first, enabling sub-second analytics on GB-scale datasets using familiar SQL. Version 1.5 ("Variegata") introduces the VARIANT type for semi-structured data, a built-in GEOMETRY type, a redesigned CLI with syntax highlighting and a pager, an experimental PEG parser, and the read_duckdb() table function, reflecting DuckDB's expanding role beyond traditional analytics.
What This Cheat Sheet Covers
This topic spans 29 focused tables and 286 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core SQL Operations
DuckDB's SQL dialect is a superset of standard SQL with PostgreSQL-inspired extensions. These foundational constructs cover the full query lifecycle — projection, filtering, grouping, sorting, and set operations — with DuckDB-specific additions like FROM-first syntax, GROUP BY ALL, and UNION BY NAME.
| Concept | Example | Description |
|---|---|---|
SELECT name, ageFROM users; | • Retrieves specified columns • DuckDB supports projection pushdown for Parquet/CSV. | |
FROM usersSELECT name | "Friendly SQL" allows FROM before SELECT, or omitting SELECT entirely (defaults to SELECT *). | |
WHERE age >= 18AND status = 'active' | • Filters rows before aggregation • supports predicate pushdown to file formats. | |
SELECT dept, year,SUM(sales)GROUP BY ALL | Infers grouping columns automatically from the SELECT list, eliminating repetition. | |
GROUP BY country,ROLLUP(year) | • Aggregates rows by key • supports ROLLUP, CUBE, GROUPING SETS, ALL. | |
HAVING COUNT(*) > 10 | • Filters aggregated results after GROUP BY• operates on aggregate functions. | |
ORDER BY ALL DESC | • Sorts by all output columns • ORDER BY ALL ensures deterministic results without naming each column | |
LIMIT 10% orLIMIT 100 OFFSET 50 | • Paginates results; LIMIT 10% returns a percentage of rows• large offsets are slow — prefer keyset pagination. | |
SELECT DISTINCTcategory | • Returns unique rows • aggregation pushdown to Parquet metadata when possible. |