Xarray is Python's foundational library for working with labeled, multi-dimensional arrays, bringing the power of pandas-like indexing and groupby operations to N-dimensional scientific datasets. Built on NumPy and integrating seamlessly with Dask for parallel computing, Xarray provides an intuitive, metadata-rich data model where dimensions have names, coordinates provide labels, and attributes store metadata—making climate, geospatial, and observational data workflows dramatically simpler than raw NumPy arrays. The library's lazy evaluation, automatic alignment, and CF-convention support enable working with terabyte-scale datasets without loading them into memory, while native NetCDF and Zarr I/O ensures compatibility with the scientific data ecosystem. Understanding the distinction between dimensions (axis names) and coordinates (labels along axes) is key to unlocking Xarray's power: dimensions define structure, coordinates enable label-based indexing.
What This Cheat Sheet Covers
This topic spans 25 focused tables and 163 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Data Structures
| Type | Example | Description |
|---|---|---|
import xarray as xrimport numpy as npdata = xr.DataArray(np.random.rand(3, 4), dims=['x', 'y'], coords={'x': [0, 1, 2], 'y': [10, 20, 30, 40]}) | • Single multi-dimensional variable with labeled dimensions, coordinates, and attributes • analogous to a pandas Series for N-D data. | |
ds = xr.Dataset({'temp': data, 'pressure': data * 2})ds.attrs['description'] = 'Weather data' | • Dictionary-like container of multiple DataArrays sharing dimensions • analogous to a pandas DataFrame for N-D data with heterogeneous variables. | |
from xarray import DataTreetree = DataTree(name='root', data=ds)tree['child'] = DataTree(data=ds) | • Hierarchical tree structure of Datasets for nested groups (e.g., HDF5/Zarr groups) • enables multi-model workflows and hierarchical scientific data. |