Data Analysis with Python Cheat Sheet

Updated 2026-04-29

Next Topic: Data Science Core Cheat Sheet

Data analysis with Python centers on Pandas for tabular data manipulation and NumPy for numerical computing. Pandas provides DataFrames — 2D labeled data structures — enabling SQL-like operations, while NumPy delivers vectorized array computations orders of magnitude faster than pure Python. Pandas 3.0 introduced Copy-on-Write by default, a dedicated str dtype, and pd.col() expressions, making data manipulation more predictable and performant. Together with NumPy's modern random Generator API (default_rng), they form the foundation of Python's data science ecosystem, handling everything from cleaning messy datasets to time series analysis and statistical aggregation.

What This Cheat Sheet Covers

This topic spans 23 focused tables and 189 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Reading and Writing DataTable 2: DataFrame Creation and InspectionTable 3: Indexing and SelectionTable 4: NumPy Array CreationTable 5: NumPy Indexing and SlicingTable 6: Data Cleaning and Missing ValuesTable 7: Data Transformation and ManipulationTable 8: NumPy Array ManipulationTable 9: Aggregation and GroupingTable 10: Data Binning and EncodingTable 11: Combining DataFramesTable 12: Sorting and RankingTable 13: String OperationsTable 14: Time Series and DatetimeTable 15: NumPy Mathematical OperationsTable 16: NumPy Statistical FunctionsTable 17: NumPy Broadcasting and VectorizationTable 18: NumPy Linear AlgebraTable 19: NumPy Random SamplingTable 20: Data Type OperationsTable 21: Advanced Indexing TechniquesTable 22: Performance OptimizationTable 23: Visualization Integration

Table 1: Reading and Writing Data

Method	Example	Description
read_csv()	`df = pd.read_csv('data.csv')`	• Reads comma-separated values file into DataFrame • supports thousands of rows per second with automatic type inference.
read_excel()	`df = pd.read_excel('data.xlsx', sheet_name='Sheet1')`	• Reads Excel files (.xlsx, .xls) • can target specific sheets and ranges.
to_csv()	`df.to_csv('output.csv', index=False)`	• Exports DataFrame to CSV • `index=False` omits row numbers.
to_excel()	`df.to_excel('output.xlsx', sheet_name='Data')`	• Exports to Excel file • requires `openpyxl` or `xlsxwriter` engine.
read_json()	`df = pd.read_json('data.json', orient='records')`	• Parses JSON into DataFrame • handles various JSON structures via `orient` parameter.
read_sql()	`df = pd.read_sql('SELECT * FROM table', conn)`	Queries SQL database directly into DataFrame using connection object.

Table 1: Reading and Writing Data

Method	Example	Description
read_csv()	`df = pd.read_csv('data.csv')`	• Reads comma-separated values file into DataFrame • supports thousands of rows per second with automatic type inference.
read_excel()	`df = pd.read_excel('data.xlsx', sheet_name='Sheet1')`	• Reads Excel files (.xlsx, .xls) • can target specific sheets and ranges.
to_csv()	`df.to_csv('output.csv', index=False)`	• Exports DataFrame to CSV • `index=False` omits row numbers.
to_excel()	`df.to_excel('output.xlsx', sheet_name='Data')`	• Exports to Excel file • requires `openpyxl` or `xlsxwriter` engine.
read_json()	`df = pd.read_json('data.json', orient='records')`	• Parses JSON into DataFrame • handles various JSON structures via `orient` parameter.
read_sql()	`df = pd.read_sql('SELECT * FROM table', conn)`	Queries SQL database directly into DataFrame using connection object.