Data analysis with Python centers on Pandas for tabular data manipulation and NumPy for numerical computing. Pandas provides DataFrames — 2D labeled data structures — enabling SQL-like operations, while NumPy delivers vectorized array computations orders of magnitude faster than pure Python. Pandas 3.0 introduced Copy-on-Write by default, a dedicated str dtype, and pd.col() expressions, making data manipulation more predictable and performant. Together with NumPy's modern random Generator API (default_rng), they form the foundation of Python's data science ecosystem, handling everything from cleaning messy datasets to time series analysis and statistical aggregation.
What This Cheat Sheet Covers
This topic spans 23 focused tables and 189 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Reading and Writing Data
| Method | Example | Description |
|---|---|---|
df = pd.read_csv('data.csv') | • Reads comma-separated values file into DataFrame • supports thousands of rows per second with automatic type inference. | |
df = pd.read_excel('data.xlsx', sheet_name='Sheet1') | • Reads Excel files (.xlsx, .xls) • can target specific sheets and ranges. | |
df.to_csv('output.csv', index=False) | • Exports DataFrame to CSV • index=False omits row numbers. | |
df.to_excel('output.xlsx', sheet_name='Data') | • Exports to Excel file • requires openpyxl or xlsxwriter engine. | |
df = pd.read_json('data.json', orient='records') | • Parses JSON into DataFrame • handles various JSON structures via orient parameter. | |
df = pd.read_sql('SELECT * FROM table', conn) | Queries SQL database directly into DataFrame using connection object. |