Xarray is Python's foundational library for working with labeled, multi-dimensional arrays, bringing the power of pandas-like indexing and groupby operations to N-dimensional scientific datasets. Built on NumPy and integrating seamlessly with Dask for parallel computing, Xarray provides an intuitive, metadata-rich data model where dimensions have names, coordinates provide labels, and attributes store metadata—making climate, geospatial, and observational data workflows dramatically simpler than raw NumPy arrays. The library's lazy evaluation, automatic alignment, and CF-convention support enable working with terabyte-scale datasets without loading them into memory, while native NetCDF and Zarr I/O ensures compatibility with the scientific data ecosystem. As of v2026.4.0, attributes are preserved by default through all operations, and xr.ufuncs has been removed in favor of calling NumPy ufuncs directly (e.g., np.sin(data)). Understanding the distinction between dimensions (axis names) and coordinates (labels along axes) is key to unlocking Xarray's power: dimensions define structure, coordinates enable label-based indexing.
What This Cheat Sheet Covers
This topic spans 26 focused tables and 187 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Data Structures
Xarray's data model is built on four nested layers: the raw array backend, the Variable (unlabeled array with dims), the DataArray (a named, labeled variable), and the Dataset (a dict of aligned DataArrays). The DataTree adds hierarchical grouping on top. Knowing which layer you need prevents most type errors.
| Type | Example | Description |
|---|---|---|
import xarray as xrimport numpy as npdata = xr.DataArray(np.random.rand(3, 4), dims=['x', 'y'], coords={'x': [0, 1, 2], 'y': [10, 20, 30, 40]}) | • Single multi-dimensional variable with labeled dimensions, coordinates, and attributes • analogous to a pandas Series for N-D data. | |
ds = xr.Dataset({'temp': data, 'pressure': data * 2})ds.attrs['description'] = 'Weather data' | • Dictionary-like container of multiple DataArrays sharing dimensions • analogous to a pandas DataFrame for N-D data with heterogeneous variables. | |
from xarray import DataTreetree = DataTree(dataset=ds)tree['child'] = DataTree(dataset=ds) | • Hierarchical tree structure of Datasets for nested groups (e.g., HDF5/Zarr groups) • enables multi-model workflows and hierarchical scientific data. | |
data.coords['time'] = pd.date_range('2020-01-01', periods=3) | • Labels for dimensions that enable fast label-based indexing and automatic alignment • can be dimension coordinates (1-D, same name as dim) or non-dimension coordinates (multi-D or auxiliary). |