Pandas Cheat Sheet

Updated 2026-04-21

Next Topic: Panel Data Analysis Cheat Sheet

Pandas is the dominant open-source Python library for data manipulation and analysis, built on top of NumPy and optionally accelerated by PyArrow. It provides two primary data structures—Series (1-dimensional) and DataFrame (2-dimensional)—designed for efficient handling of structured data. Version 3.0 (January 2026, current 3.0.2) introduced Copy-on-Write as the default behavior, a dedicated str dtype for text data, and the new pd.col() expression syntax for cleaner column references—all of which improve performance, memory efficiency, and code readability. The library excels at reading from dozens of file formats, cleaning messy data, and transforming datasets for analysis—making it the go-to tool for data scientists working with tabular data in Python.

What This Cheat Sheet Covers

This topic spans 27 focused tables and 196 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Data StructuresTable 2: Reading Data from FilesTable 3: Writing Data to FilesTable 4: Viewing and Inspecting DataTable 5: Selecting Data by LabelTable 6: Selecting Data by PositionTable 7: Filtering and SelectionTable 8: Handling Missing DataTable 9: Adding, Removing, and Modifying ColumnsTable 10: Sorting DataTable 11: Data Type ConversionTable 12: Aggregation and GroupByTable 13: Statistical OperationsTable 14: Merging and Joining DataFramesTable 15: Reshaping DataTable 16: String OperationsTable 17: DateTime OperationsTable 18: Window FunctionsTable 19: Duplicate HandlingTable 20: Binning and DiscretizationTable 21: Apply and Mapping FunctionsTable 22: Expression Syntax (pd.col)Table 23: Index OperationsTable 24: Copy-on-Write BehaviorTable 25: Performance and OptimizationTable 26: Styling and VisualizationTable 27: Advanced Features

Table 1: Core Data Structures

Everything in Pandas is built on the DataFrame and the Series — a labeled table and a labeled column. The objects here cover how data is held and indexed, including the specialized index types and the new Pandas 3.0 str dtype that finally treats text as a first-class citizen rather than a generic object.

Structure	Example	Description
DataFrame	`df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})`	• 2-dimensional labeled data structure with columns of potentially different types • the primary Pandas object.
Series	`s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])`	• 1-dimensional labeled array holding any data type • functions like a column in a DataFrame.
Index	`idx = pd.Index(['x', 'y', 'z'])`	Immutable array implementing an ordered, sliceable set used for axis labels.
RangeIndex	`idx = pd.RangeIndex(start=0, stop=100, step=1)`	• Default index type storing only start/stop/step instead of full array • optimized in Pandas 3.0 for arithmetic and set operations.

Table 1: Core Data Structures

Structure	Example	Description
DataFrame	`df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})`	• 2-dimensional labeled data structure with columns of potentially different types • the primary Pandas object.
Series	`s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])`	• 1-dimensional labeled array holding any data type • functions like a column in a DataFrame.
Index	`idx = pd.Index(['x', 'y', 'z'])`	Immutable array implementing an ordered, sliceable set used for axis labels.
RangeIndex	`idx = pd.RangeIndex(start=0, stop=100, step=1)`	• Default index type storing only start/stop/step instead of full array • optimized in Pandas 3.0 for arithmetic and set operations.