Pandas is the dominant open-source Python library for data manipulation and analysis, built on top of NumPy and optionally accelerated by PyArrow. It provides two primary data structuresβSeries (1-dimensional) and DataFrame (2-dimensional)βdesigned for efficient handling of structured data. Version 3.0 (January 2026, current 3.0.2) introduced Copy-on-Write as the default behavior, a dedicated str dtype for text data, and the new pd.col() expression syntax for cleaner column referencesβall of which improve performance, memory efficiency, and code readability. The library excels at reading from dozens of file formats, cleaning messy data, and transforming datasets for analysisβmaking it the go-to tool for data scientists working with tabular data in Python.
What This Cheat Sheet Covers
This topic spans 27 focused tables and 196 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Data Structures
| Structure | Example | Description |
|---|---|---|
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) | β’ 2-dimensional labeled data structure with columns of potentially different types β’ the primary Pandas object. | |
s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) | β’ 1-dimensional labeled array holding any data type β’ functions like a column in a DataFrame. | |
idx = pd.Index(['x', 'y', 'z']) | Immutable array implementing an ordered, sliceable set used for axis labels. | |
idx = pd.RangeIndex(start=0, stop=100, step=1) | β’ Default index type storing only start/stop/step instead of full array β’ optimized in Pandas 3.0 for arithmetic and set operations. |