Xarray Cheat Sheet

Updated 2026-05-28

Xarray is Python's foundational library for working with labeled, multi-dimensional arrays, bringing the power of pandas-like indexing and groupby operations to N-dimensional scientific datasets. Built on NumPy and integrating seamlessly with Dask for parallel computing, Xarray provides an intuitive, metadata-rich data model where dimensions have names, coordinates provide labels, and attributes store metadata—making climate, geospatial, and observational data workflows dramatically simpler than raw NumPy arrays. The library's lazy evaluation, automatic alignment, and CF-convention support enable working with terabyte-scale datasets without loading them into memory, while native NetCDF and Zarr I/O ensures compatibility with the scientific data ecosystem. As of v2026.4.0, attributes are preserved by default through all operations, and xr.ufuncs has been removed in favor of calling NumPy ufuncs directly (e.g., np.sin(data)). Understanding the distinction between dimensions (axis names) and coordinates (labels along axes) is key to unlocking Xarray's power: dimensions define structure, coordinates enable label-based indexing.

What This Cheat Sheet Covers

This topic spans 26 focused tables and 187 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Data StructuresTable 2: Creating Data StructuresTable 3: Loading Data from FilesTable 4: Dimension and Coordinate OperationsTable 5: Label-Based Indexing and SelectionTable 6: Positional and Advanced IndexingTable 7: Masking and Conditional SelectionTable 8: Missing Data HandlingTable 9: Arithmetic and Mathematical OperationsTable 10: Aggregation and Reduction MethodsTable 11: GroupBy OperationsTable 12: Resampling and Rolling WindowsTable 13: Weighted OperationsTable 14: Broadcasting, Alignment, and CombiningTable 15: Reshaping and ReorganizingTable 16: I/O Operations — Writing DataTable 17: Format-Specific I/OTable 18: Parallel Computing with DaskTable 19: Custom Functions with apply_ufuncTable 20: PlottingTable 21: Time Series OperationsTable 22: Attributes and MetadataTable 23: Conversion MethodsTable 24: Custom Accessors and ExtensionsTable 25: Advanced and Optimization TechniquesTable 26: DataTree Operations

Table 1: Core Data Structures

Xarray's data model is built on four nested layers: the raw array backend, the Variable (unlabeled array with dims), the DataArray (a named, labeled variable), and the Dataset (a dict of aligned DataArrays). The DataTree adds hierarchical grouping on top. Knowing which layer you need prevents most type errors.

Type	Example	Description
DataArray	`import xarray as xr` `import numpy as np` `data = xr.DataArray(np.random.rand(3, 4), dims=['x', 'y'], coords={'x': [0, 1, 2], 'y': [10, 20, 30, 40]})`	• Single multi-dimensional variable with labeled dimensions, coordinates, and attributes • analogous to a pandas Series for N-D data.
Dataset	`ds = xr.Dataset({'temp': data, 'pressure': data * 2})` `ds.attrs['description'] = 'Weather data'`	• Dictionary-like container of multiple DataArrays sharing dimensions • analogous to a pandas DataFrame for N-D data with heterogeneous variables.
DataTree	`from xarray import DataTree` `tree = DataTree(dataset=ds)` `tree['child'] = DataTree(dataset=ds)`	• Hierarchical tree structure of Datasets for nested groups (e.g., HDF5/Zarr groups) • enables multi-model workflows and hierarchical scientific data.
Coordinates	`data.coords['time'] = pd.date_range('2020-01-01', periods=3)`	• Labels for dimensions that enable fast label-based indexing and automatic alignment • can be dimension coordinates (1-D, same name as dim) or non-dimension coordinates (multi-D or auxiliary).

Table 1: Core Data Structures

Type	Example	Description
DataArray	`import xarray as xr` `import numpy as np` `data = xr.DataArray(np.random.rand(3, 4), dims=['x', 'y'], coords={'x': [0, 1, 2], 'y': [10, 20, 30, 40]})`	• Single multi-dimensional variable with labeled dimensions, coordinates, and attributes • analogous to a pandas Series for N-D data.
Dataset	`ds = xr.Dataset({'temp': data, 'pressure': data * 2})` `ds.attrs['description'] = 'Weather data'`	• Dictionary-like container of multiple DataArrays sharing dimensions • analogous to a pandas DataFrame for N-D data with heterogeneous variables.
DataTree	`from xarray import DataTree` `tree = DataTree(dataset=ds)` `tree['child'] = DataTree(dataset=ds)`	• Hierarchical tree structure of Datasets for nested groups (e.g., HDF5/Zarr groups) • enables multi-model workflows and hierarchical scientific data.
Coordinates	`data.coords['time'] = pd.date_range('2020-01-01', periods=3)`	• Labels for dimensions that enable fast label-based indexing and automatic alignment • can be dimension coordinates (1-D, same name as dim) or non-dimension coordinates (multi-D or auxiliary).