Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Missing Data Analysis and Imputation Cheat Sheet

Missing Data Analysis and Imputation Cheat Sheet

Back to Data ScienceUpdated 2026-05-15

Missing data is a pervasive challenge in data science and machine learning, arising in virtually every real-world dataset from sensor failures, survey non-response, data integration issues, or intentional omissions. Understanding the mechanism behind the missingness—whether data is Missing Completely At Random (MCAR), Missing At Random (MAR), or Missing Not At Random (MNAR)—is critical because it fundamentally determines which imputation strategies produce valid, unbiased results and which introduce systematic distortion. Rather than simply deleting incomplete observations, modern practitioners employ a sophisticated toolkit spanning statistical methods, machine learning models, and deep learning architectures to reconstruct missing values while preserving distributional properties and relationships. The key insight is that missing values themselves carry information: the pattern and location of missingness can be engineered as features, and the choice between deletion, simple imputation, or complex multivariate approaches must balance statistical rigor, computational efficiency, and the ultimate use case.

What This Cheat Sheet Covers

This topic spans 17 focused tables and 85 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Missing Data MechanismsTable 2: Missing Data VisualizationTable 3: Univariate Imputation StrategiesTable 4: Multivariate Imputation MethodsTable 5: Deletion StrategiesTable 6: Advanced Deep Learning ImputationTable 7: Imputation in Machine Learning PipelinesTable 8: Handling Missingness as a FeatureTable 9: Time Series Specific ImputationTable 10: Evaluation of Imputation QualityTable 11: Missing Value Generation for Testing (Amputation)Table 12: Specialized Imputation LibrariesTable 13: Matrix Factorization and Collaborative FilteringTable 14: Regression-Based ImputationTable 15: Categorical Data ImputationTable 16: Model-Based and Probabilistic ImputationTable 17: Best Practices and Guidelines

Table 1: Missing Data Mechanisms

MechanismExampleDescription
MCAR (Missing Completely At Random)
survey_df[np.random.choice(len(survey_df), 100)] = np.nan
Missingness is independent of both observed and unobserved data
• Probability of being missing is equal for all cases
• Complete case analysis produces unbiased estimates
• Least restrictive assumption
MAR (Missing At Random)
df.loc[df['age'] > 65, 'income'] = np.nan
Missingness depends only on observed data, not on the missing values themselves
• Can be predicted from other variables in the dataset
• Most common assumption in practice
• Allows valid imputation using observed data
MNAR (Missing Not At Random)
df.loc[df['depression_score'] > 8, 'depression_score'] = np.nan
Missingness depends on the unobserved values themselves
• Individuals with high values are less likely to report them
• Cannot be fully addressed without external information
• Requires sensitivity analysis or specialized models

More in Data Science

  • Matplotlib Cheat Sheet
  • MLflow Experiment Tracking and Model Registry Cheat Sheet
  • AB Testing and Online Experimentation Cheat Sheet
  • Design of Experiments (DOE) Cheat Sheet
  • OpenRefine Cheat Sheet
  • SciPy Cheat Sheet
View all 47 topics in Data Science