Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Data Analysis Cheat Sheet

Data Analysis Cheat Sheet

Back to Data Science
Updated 2026-04-29
Next Topic: Data Analysis with Python Cheat Sheet

Data Analysis Cheat Sheet

Data analysis is the systematic process of inspecting, cleaning, transforming, and modeling data to discover useful information and support decision-making. It encompasses exploratory data analysis (EDA), data cleaning, wrangling, and feature engineering—the critical preparatory steps that transform raw data into analysis-ready datasets. While often associated with statistics and machine learning, data analysis fundamentally serves as the bridge between raw observations and actionable insights. A key insight: spending adequate time on quality data preparation typically has a larger impact on model performance than algorithm selection itself—well-prepared data enables simpler models to outperform complex ones trained on poor data. Modern workflows additionally use SHAP-based feature importance, automated drift detection, and schema validation tools to ensure pipelines remain robust and production-ready.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 152 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Exploratory Data Analysis (EDA) TechniquesTable 2: Data Cleaning TechniquesTable 3: Outlier Detection and TreatmentTable 4: Data Transformation and ScalingTable 5: Feature Encoding for Categorical VariablesTable 6: Feature Engineering TechniquesTable 7: Dimensionality Reduction and Feature SelectionTable 8: Data Aggregation and GroupingTable 9: Data Reshaping and MergingTable 10: Statistical Analysis and Hypothesis TestingTable 11: Data Wrangling OperationsTable 12: Data Quality and ValidationTable 13: Data Visualization for AnalysisTable 14: Sampling and Balancing TechniquesTable 15: Automated EDA and Monitoring Tools

Table 1: Exploratory Data Analysis (EDA) Techniques

Before you clean or model anything, you look — and these are the moves that turn a raw DataFrame into an understood one. From quick describe() summaries to Q-Q plots, correlation heatmaps, and one-line automated profiling reports, each technique answers a different question about shape, spread, missingness, and how variables relate. Working from single variables up to multivariate views is the natural order to read them in.

TechniqueExampleDescription
summary statistics
df.describe()
df.info()
• Computes descriptive measures—count, mean, std, min, quartiles, max for numerical columns
• info() shows data types and null counts.
value counts
df['category'].value_counts(normalize=True)
• Counts frequency of unique values in categorical columns
• normalize=True returns proportions instead of counts.
univariate analysis
df['age'].hist(bins=30)
df['category'].value_counts()
• Examines single variable distributions using histograms, box plots, and frequency counts
• reveals central tendency, spread, and outliers for one feature at a time.
bivariate analysis
df.plot.scatter(x='age', y='income')
df.groupby('region')['sales'].mean()
• Explores relationships between two variables through scatter plots, grouped aggregations, and cross-tabulations
• identifies correlations and patterns.
multivariate analysis
sns.pairplot(df, hue='target')
df.corr()
• Examines interactions among three or more variables using pair plots, correlation matrices, and heatmaps
• reveals complex dependencies.
correlation analysis
df.corr()
sns.heatmap(df.corr(), annot=True)
• Calculates pairwise correlation coefficients between numerical features
• heatmap visualization reveals multicollinearity and feature relationships.

More in Data Science

  • Causal Inference Cheat Sheet
  • Data Analysis with Python Cheat Sheet
  • AB Testing and Online Experimentation Cheat Sheet
  • GeoPandas Cheat Sheet
  • OpenRefine Cheat Sheet
  • SciPy Cheat Sheet
View all 47 topics in Data Science