Polars is a blazingly fast DataFrame library built in Rust and designed for performance. It features a powerful expression-based API, lazy evaluation with automatic query optimization, parallel execution, and seamless integration with the Apache Arrow ecosystem. This cheat sheet covers everything from basic operations to advanced optimization techniques, including streaming for large datasets and interoperability with pandas and Arrow.
What This Cheat Sheet Covers
This topic spans 23 focused tables and 254 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
FundamentalsExpressions and ContextsLazy vs Eager ExecutionQuery OptimizationData Selection and FilteringJoinsAggregations and Group ByWindow FunctionsFile I/O and ScanningSchema HandlingStreaming ModePerformance TuningInteroperabilityString OperationsDatetime OperationsList OperationsNull HandlingAdvanced AggregationsPivoting and ReshapingAdvanced OperationsStatistical FunctionsColumn SelectorsPractical Examples
Fundamentals
| Concept | Example | Description |
|---|---|---|
import polars as pl | Standard import convention for Polars library | |
df = pl.DataFrame({"col1": [1, 2], "col2": ["a", "b"]}) | Create DataFrame from dictionary, lists, or other data structures | |
df = pl.read_csv("data.csv") | Read CSV file into eager DataFrame with full data loaded into memory | |
df = pl.read_parquet("data.parquet") | Read Parquet file with columnar compression for efficient storage | |
df.write_csv("output.csv") | Export DataFrame to CSV format | |
df.write_parquet("output.parquet") | Export DataFrame to Parquet with compression | |
lf = pl.scan_csv("data.csv") | Create lazy evaluation plan without loading data, enables query optimization | |
df = lf.collect() | Execute lazy query plan and materialize results into DataFrame |