DuckDB is an in-process analytical database designed for OLAP workloads without server management — think SQLite for analytics. It runs directly within your Python, R, or CLI environment, offering columnar storage, vectorized execution, and zero-copy integration with pandas, Polars, and Apache Arrow. Unlike traditional databases, DuckDB executes queries in-memory with automatic spill-to-disk for larger-than-RAM datasets, enabling fast aggregations, window functions, and complex analytical queries on CSV, Parquet, JSON, and cloud storage (S3/GCS) without ETL.
What This Cheat Sheet Covers
This topic spans 26 focused tables and 143 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Database Connection & Initialization
| Method | Example | Description |
|---|---|---|
con = duckdb.connect() | Creates an in-memory database connection; data lost when process ends. | |
con = duckdb.connect('db.duckdb') | Opens or creates a persistent database file on disk; survives process restarts. | |
con = duckdb.connect('db.duckdb', read_only=True) | Opens database in read-only mode; multiple read-only connections allowed across processes. |