Pandas is the dominant open-source Python library for data manipulation and analysis, built on top of NumPy and optionally accelerated by PyArrow. It provides two primary data structures—Series (1-dimensional) and DataFrame (2-dimensional)—designed for efficient handling of structured data. Version 3.0 (January 2026, current 3.0.2) introduced Copy-on-Write as the default behavior, a dedicated str dtype for text data, and the new pd.col() expression syntax for cleaner column references—all of which improve performance, memory efficiency, and code readability. The library excels at reading from dozens of file formats, cleaning messy data, and transforming datasets for analysis—making it the go-to tool for data scientists working with tabular data in Python.
What This Cheat Sheet Covers
This topic spans 27 focused tables and 196 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Data Structures
Everything in Pandas is built on the DataFrame and the Series — a labeled table and a labeled column. The objects here cover how data is held and indexed, including the specialized index types and the new Pandas 3.0 str dtype that finally treats text as a first-class citizen rather than a generic object.
| Structure | Example | Description |
|---|---|---|
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) | • 2-dimensional labeled data structure with columns of potentially different types • the primary Pandas object. | |
s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) | • 1-dimensional labeled array holding any data type • functions like a column in a DataFrame. | |
idx = pd.Index(['x', 'y', 'z']) | Immutable array implementing an ordered, sliceable set used for axis labels. | |
idx = pd.RangeIndex(start=0, stop=100, step=1) | • Default index type storing only start/stop/step instead of full array • optimized in Pandas 3.0 for arithmetic and set operations. |