DuckDB for Analytical Data Science Cheat Sheet

Updated 2026-05-15

DuckDB is an in-process analytical database designed for OLAP workloads without server management — think SQLite for analytics. It runs directly within your Python, R, or CLI environment, offering columnar storage, vectorized execution, and zero-copy integration with pandas, Polars, and Apache Arrow. Unlike traditional databases, DuckDB executes queries in-memory with automatic spill-to-disk for larger-than-RAM datasets, enabling fast aggregations, window functions, and complex analytical queries on CSV, Parquet, JSON, and cloud storage (S3/GCS) without ETL.

What This Cheat Sheet Covers

This topic spans 26 focused tables and 143 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Database Connection & InitializationTable 2: Reading Data FilesTable 3: Python API Relation MethodsTable 4: Window FunctionsTable 5: Aggregate FunctionsTable 6: Data Types (Nested)Table 7: Table FunctionsTable 8: User-Defined Functions (UDFs)Table 9: Prepared StatementsTable 10: ExtensionsTable 11: Cloud Storage IntegrationTable 12: JOIN TypesTable 13: Transaction ManagementTable 14: Query Optimization & AnalysisTable 15: Import/Export OperationsTable 16: Date & Time FunctionsTable 17: String & Pattern MatchingTable 18: Performance ConfigurationTable 19: Common Table Expressions (CTEs)Table 20: Advanced SQL FeaturesTable 21: SamplingTable 22: Conditional ExpressionsTable 23: Sorting & OrderingTable 24: JSON FunctionsTable 25: Spatial Functions (Extension)Table 26: Comparison with Other Databases

Table 1: Database Connection & Initialization

Every DuckDB session starts with a connection, and the choice you make here decides whether your data lives only in RAM or persists to a file on disk. These methods cover the spectrum — throwaway in-memory analysis, a durable database file, read-only sharing across processes, and even attaching external MySQL, Postgres, or SQLite databases so you can query them all in one place.

Method	Example	Description
duckdb.connect()	`con = duckdb.connect()`	• Creates an in-memory database connection • data lost when process ends
duckdb.connect() persistent	`con = duckdb.connect('db.duckdb')`	• Opens or creates a persistent database file on disk • survives process restarts
duckdb.connect() read-only	`con = duckdb.connect('db.duckdb', read_only=True)`	• Opens database in read-only mode • multiple read-only connections allowed across processes

Table 1: Database Connection & Initialization

Method	Example	Description
duckdb.connect()	`con = duckdb.connect()`	• Creates an in-memory database connection • data lost when process ends
duckdb.connect() persistent	`con = duckdb.connect('db.duckdb')`	• Opens or creates a persistent database file on disk • survives process restarts
duckdb.connect() read-only	`con = duckdb.connect('db.duckdb', read_only=True)`	• Opens database in read-only mode • multiple read-only connections allowed across processes