Raster data analysis involves processing gridded geospatial data representing continuous surfaces or discrete values across space, commonly used for satellite imagery, digital elevation models, and land cover classification. Rasterio provides a Pythonic interface built on top of GDAL (Geospatial Data Abstraction Library), the industry-standard C/C++ library for reading, writing, and transforming raster and vector geospatial formats. rioxarray extends Xarray with rasterio capabilities for labeled multi-dimensional raster workflows. The key to efficient raster processing lies in understanding windowed I/O, affine transformations, virtual datasets, and cloud-optimized formats — and in GDAL 3.11+ the unified gdal CLI modernizes the toolchain with composable pipelines and consistent subcommands.
What This Cheat Sheet Covers
This topic spans 36 focused tables and 321 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Opening and Reading Datasets
Reading data efficiently sets the stage for all analysis. Rasterio's context manager pattern ensures files close cleanly, while the 1-indexed band convention follows GDAL's long-standing design. In Rasterio 1.5+, a thread_safe parameter and a custom opener keyword enable thread-safe and filesystem-agnostic access for cloud workflows.
| Method | Example | Description |
|---|---|---|
with rasterio.open('file.tif') as src: data = src.read(1) | • Opens a raster using the context manager pattern • src is a DatasetReader with metadata and pixel access | |
band1 = src.read(1) | • Reads one band by 1-indexed band number into a 2D NumPy array • GDAL convention indexes from 1, not 0 | |
bands = src.read([1, 2, 3]) | Reads specific bands into a 3D array with shape (bands, rows, cols) | |
all_data = src.read() | • Reads entire dataset into 3D array • omitting band index returns all bands | |
data = src.read(1, masked=True) | • Returns NumPy masked array where nodata pixels are masked • integrates with NumPy masked operations | |
data = src.read(1, out_shape=(512, 512)) | • Reads and resamples on-the-fly to specified dimensions • useful for downsampling during read |