DuckDB is an in-process columnar SQL analytics database designed for fast analytical queries on local data without server overhead. Unlike traditional databases, DuckDB executes directly within your application (like SQLite but optimized for OLAP workloads), making it ideal for data science workflows, ETL pipelines, and interactive analytics on laptops or in production. A key differentiator is zero-copy integration with Parquet, CSV, Arrow, and Pandas—DuckDB can query files and data structures directly without importing them first, enabling sub-second analytics on GB-scale datasets using familiar SQL. The latest version (1.5.0) introduces the VARIANT type for semi-structured data and moves GEOMETRY to core, reflecting DuckDB's expanding role beyond traditional analytics into geospatial, JSON-heavy, and streaming workflows.
What This Cheat Sheet Covers
This topic spans 23 focused tables and 189 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Core SQL Operations
| Concept | Example | Description |
|---|---|---|
SELECT name, ageFROM users; | • Retrieves specified columns • DuckDB supports projection pushdown for Parquet/CSV. | |
FROM usersSELECT name | "Friendly SQL" allows omitting SELECT clause or reversing clause order. | |
WHERE age >= 18AND status = 'active' | • Filters rows before aggregation • supports predicate pushdown to file formats. | |
GROUP BY country,ROLLUP(year) | • Aggregates rows by key • supports ROLLUP, CUBE, GROUPING SETS, ALL. | |
HAVING COUNT(*) > 10 | • Filters aggregated results after GROUP BY• operates on aggregate functions. | |
ORDER BY ALL DESC | • Sorts results • ORDER BY ALL sorts by all columns, uses BINARY collation default. | |
LIMIT 100 OFFSET 50 | • Paginates results • large offsets can be slow, prefer keyset pagination for performance. |