DuckDB is an in-process analytical database designed for OLAP workloads without server management — think SQLite for analytics. It runs directly within your Python, R, or CLI environment, offering columnar storage, vectorized execution, and zero-copy integration with pandas, Polars, and Apache Arrow. Unlike traditional databases, DuckDB executes queries in-memory with automatic spill-to-disk for larger-than-RAM datasets, enabling fast aggregations, window functions, and complex analytical queries on CSV, Parquet, JSON, and cloud storage (S3/GCS) without ETL.
What This Cheat Sheet Covers
This topic spans 26 focused tables and 143 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Database Connection & Initialization
Every DuckDB session starts with a connection, and the choice you make here decides whether your data lives only in RAM or persists to a file on disk. These methods cover the spectrum — throwaway in-memory analysis, a durable database file, read-only sharing across processes, and even attaching external MySQL, Postgres, or SQLite databases so you can query them all in one place.
| Method | Example | Description |
|---|---|---|
con = duckdb.connect() | • Creates an in-memory database connection • data lost when process ends | |
con = duckdb.connect('db.duckdb') | • Opens or creates a persistent database file on disk • survives process restarts | |
con = duckdb.connect('db.duckdb', read_only=True) | • Opens database in read-only mode • multiple read-only connections allowed across processes |